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REPORT ON A PSYCHOMETRIC MISSION TO CLINICIA* 
Ler J. CRONBACH 
UNIVERSITY OF ILLINOIS 


Of necessity, members of this Society devote their attention primarily to 
the domestic problems of psychometrics. We must think also, however, of our 
foreign relations, for Psychometrikans serve chiefly through their contribution 
to other psychologists. Tonight I wish to discuss particularly our relations with 
a growing power located at a great distance around the psychological world 
from us, namely, Clinicia. 

The recent opening of Clinicia to science offers us a great challenge. For 
while the natives of Clinicia are busy and happy, they are regrettably unchas- 
tened by the stern and moral truths on which our civilization is built. To 
remedy this, some of us are already serving as missionaries, informing Clin- 
icians about the Psychometric way of life. This is a report on our experiences. 

Clinicia differs much from our own land of Psychometrika. One of my 
colleagues, a member of Anthropologists Anonymous, has kindly prepared 
comparable descriptions of both cultures to help us view them with equal 


objectivity. He writes: 


Psychometrika is a small nation, but possesses technical secrets which give it vast 
power. A small task force of Psychometrikans, armed with devices which seem primitive 
today, conquered the United States Army in 1916; and the young braves of the following 
generation overcame the U. S. Air Force in the 1940’s with a new and awesome weapon 
called the stanine. In times of peace, Psychometrika is a hive of productivity, noted chiefly 
for its output of incisive scientific instruments, incomprehensible mathematical formulas, 
and unfavorable book reviews. 

The culture of Psychometrika is strikingly conservative and formalized, with a tre- 
mendous emphasis on standardization. In the Orthodox temples of Psychometrika, the 
youth are told daily that they should never use a test which is not completely standardized. 
Few Psychometrikans, however, allow this belief to interfere with the sale of tests they 
manufacture. 

For all its interest in norms and standards, the Society is not totally restrictive of 
individuality. Deviations from the norm are indeed given much consideration, provided 
they are standard deviations. 

Religion has no place in Psychometric culture, and members are taught to suspect all 
forms of faith. The nearest to a religious belief is the Psychometrikan’s loyalty to the so- 
called Law of Parsimony. Performing the fitual gyrations prescribed by this law, the 
Psychometrikan undergoes a mystical expéfi@nee in which various deities appear before his 






*Given as the Presidential Address of* | e Psychometric Society, September 5, 1954. 
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eyes and name themselves to him. These deities, or Factors as they are called in the native 
language, are very numerous. Since “Factor Analysis’ is a highly personal experience, each 
Psychometrikan has his own assemblage of Factors, different from that of his neighbors. 
This is one reason why faith in the Factors is slight. 

In a culture which profits from technical achievements, anthropologists expect to find 
a strong taboo against communication, intended to guard technical secrets from the outer 
world. In Psychometrika this is indeed the case. New discoveries are shared only within the 
inner aristocracy who can read the peculiar private code in which high-caste Psychometri- 
kans communicate. The origin of this secret language has aroused considerable speculation, 
since the alphabet shows traces of the language of ancient Greece. According to legend, the 
language was created by Pythagoras himself. His manuscript supposedly was preserved for 
2000 years, before a typesetter was found brave enough to commence publication of Psycho- 
metrika. 

In contrast to the forbidding coasts of Psychometrika, the shores of Clinicia are an 
alluring tropical paradise. Only its borders have been penetrated, for the rich yield of these 
shoreline plantations is sufficient to support in comfort the vast population of Clinicians. 

Climate is of fundamental importance to Clinicia’s prosperity and its ideology. Each 
native believes himself personally responsible for the climate; little Clinicians are taught 
special rituals they must perform to keep their climate warm. One climate-maintaining ritual 
is built chiefly around the repetition of the phrase, Uh-huh. This phrase is believed to have 
magical therapeutic effects when given the proper inflection. 

Clinicians have a highly developed verbal culture, and a richness of vocabulary 
unsuspected by the tourist who thinks the Clinical language consists only of Uh-huh and its 
variations. Whereas Psychometrika has roots in Greek culture, many founders of Clinicia 
came from Vienna and have imbued its institutions with the cheerful air of a Mittleuro- 
peanischer Kaffeeklatsch. 

Clinicians are markedly sociable. Strong taboos are imposed against disagreements or 
other barriers to sociality. As a consequence, Clinicia has developed a language of great 
poetic beauty and obscurity. In this obscure language, a Clinician can speak for hours on 
uncertain matters without fear of contradiction. 

Strains on the individual are eliminated in Clinicia. The taboo against disputation 
bolsters self-esteem by guaranteeing that all judgments the Clinician makes will be validated 
by others. And Clinical activities provide opportunities for pleasant companionship—if you 
like people. These inventions provide a fine culture, from the point of view of mental health— 
so fine, indeed, that the anthropologists who visit become permanent residents, and there- 
after talk as obscurely as the natives. 


The views of this anthropologist are not necessarily my own. I am sure 
we Psychometrikans are not secretive, nor lacking in faith. It is just because 
we believe in our system that we undertake, in one manner or another, the 
missionary’s role. 

The missionary to Clinicia has a difficult task, because historic conflicts 
cause many Clinicians to regard us as unsympathetic and even hostile. No 
Psychometrikan can hear without a thrill of pride our motto: If anything 
exists, it can be measured. This is an optimistic creed, and not chauvinistic as 
slogans go. But some among us flaunt the more brazen slogan: If something 
cannot be measured, it does not exist! Some Psychometrikans cannot resist an 
opportunity for a quarrel, or a chance to prove intellectual superiority by 
picking off an easy target. According to Greenwood (3), the arrogance of 
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Karl Pearson set back the use of statistics in medical fields for a long while. 
We keep such antagonism alive; only a very few years ago, an entire Presiden- 
tial address before a measurement society was devoted to destructive criti- 
cism of a major Clinica! research effort. 

Antagonism begins because Psychometrikans believe that Clinicians are 
impatient with rigorous reasoning. This is true. Wisely or unwisely, the 
Clinician is trying to solve practical problems for which the present science of 
psychology is grossly inadequate. Because research gains ground so slowly, 
some practitioners do ignore research and substitute testimonials for evidence. 
The Clinician going about his daily business, however, is not the person with 
whom Psychometrikans should communicate. 

There exists a substantial class of Clinical research workers with intelli- 
gence, imagination, and respect for truth, who have influence on practicing 
Clinicians. Clinical researchers feel too much the urgent need for results, and 
try to establish short-cut solutions instead of concentrating on less spectacular 
fundamentals. Moreover, Clinical research is done in a setting which encour- 
ages wishful interpretation of results, excessive a posteriori speculation, spe- 
cial pleading, and an undeniable general soft-headedness. The Clinical worker 
relies too little on rationales and too much on diffuse exploration of too many 
variables with too few subjects. Granting all these criticisms, it is still true that 
the Clinical researcher wants to establish his findings solidly, and he wants to 
improve his research methodology. Many Clinical investigators have already 
been converted to our beliefs, and where their understanding is primitive, they 
want to learn more. 

We often fail to give the help the Clinician is ready for. Particularly this 
occurs when the measurement specialist gives too little attention to the 
Clinical problem and dashes off a procedure to be followed for answering a 
different question from the one asked. Clinicia is a seller’s market for any 
methodological nostrum: factor analysis, Q-technique, discriminant function, 
or some other. Just say that your method deals with patterns of scores, or with 
the total personality, and you will be mobbed by would-be users. However, 
while it is easy to seduce the Clinician, you cannot hold his affections unless 
your glamorous proposals lead to satisfying consummations. If the Clinician 
feeds his data into your process, and then can only puzzle over ground-up 
fragments that come out, he learns not to ask your help again. 

The methodologist must make sure he understands the Clinician’s ques- 
tion before suggesting techniques. This is not easy, especially because the 
Clinician often asks an unanswerable question. He wants to study the “‘pat- 


\ 


tern” of performance, without specifying just which data and relations consti- | 
tute a pattern. I had an opportunity recently to ask a group of clinical investi- | 
gators just what they meant by “studying the configuration of the data.” I 
got six consecutive answers, and each answer was different. One person wanted 
to study ratios of scores; one wanted a best linear composite to predict a 
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criterion; one wanted to examine the interpretations of a test protocol by an 
expert diagnostician; and so on. If faced by ambiguous, inadequately thought- 
out questions, what is the Psychometrikan to say? One thing he cannot say: 
“Here is the meaning your question should have—and I happen to have in 
my knapsack just the remedy for that problem.” Rather, the Psychometrikan 
must proceed, question by question, to help the Clinician define his idea. 

Psychometrikans are in a peculiarly good position to help the Clinician 
just because we take quite different views of the world. The Psychometrikan 
views the world as one of simple relationships. Wherever he looks he perceives 
linear regressions, unit weights, and orthogonal variables. On the other hand 
the Clinician’s first premise is that nature is complicated, too complicated to 
be caught in a simple net. No scientific generalization can take enough things 
into account to satisfy the Clinician. 

Neither philosophy is more correct than the other. The Clinician’s passion 
for complexity is almost certainly a valid way to conceive of the universe. 
The Psychometrikan’s passion for reduction is a practical compromise, to 
simplify problems enough so that scientific methods can come to grips with 
them. 

This conflict of philosophies is an old one. In the late Middle Ages, there 
was a great battle between two schools of philosophic thought. One school 
of thought, identified particularly with William of Ockam, thought that meta- 
physical] reasoning was trapped in the lush overgrowth of Scholastic philosophy. 
Ockam’s followers hacked indiscriminately at this growth with the famous 
“Razor” of Parsimony. The principal sufferers from the Ockamite attacks were 
followers of Duns Scotus, who had freely invented constructs and ingenious 
distinctions to explain circumstances. But Ockam attacked constructs and 
complexities, with a logic that devastated Scholasticism (6). Into the ensuing 
vacancy marched inductive science, with Newton and Leibnitz carrying the 
flag of Parsimony. 

Today’s Clinician joins the Scholastic in saying, “There are more things 
in heaven and earth, Horatio, than are dreamt of in thy philosophy.”’ Are we 
intolerant if, as scientists and therefore inheritors of Ockam, we scorn this 
view? The Ockamite says we should ignore those things not absolutely neces- 
sary to explain observations. Why prate, says he, of non-linear regressions or 
configurational relationships where no one has proved them? Wait, he coun- 
sels, until they force themselves on our attention. His opponent argues that 
we should look for these relations because in a complex world it would be 
remarkable if such relations were absent (5). Admittedly, if takes vast amounts 
of data to establish such complex relations with confidence. But should we 
turn our backs upon the possibility? 

Modern statistics has shown that Ockam’s position is arbitrary. In testing 
a hypothesis, we run two kinds of risk: of accepting the hypothesis when it is 
untrue, of rejecting it when it is true. Ockam, with his advice never to reject 














L. J. CRONBACH 267 


the null hypothesis praeter necessitatem, said in effect that he was willing to 
risk any amount of error of the second kind, provided he could set an infini- 
tesimal P-value on the risk of error of the first kind. The choice of risks is not 
a matter of absolute principle, as Ockam implied. It is a matter of research 
strategy in a particular situation (2). The problem is one of balancing the risk 
of overlooking fresh trails against the risk of chasing phantoms. Clinicians are 
conditioned to fear the first of these errors; Psychometrikans abhor the second. 
These opposing biases can between them set a good scientific course. 


When the Psychometrikan contributes to Clinical progress, he is rewarded 
by pride in the power of his methods and concepts. But commerce with a 
foreign land cannot be long sustained unless it is a two-way commerce. 
Clinicia has an exportable surplus of one commodity, namely, measurement 
and research problems. Fortunately, these problems make the best of raw 
material for research in psychometrics. Working with Clinicians we often 
learn that our present measurement theories are inadequate or incorrect in 
some basic particular. 

Repeatedly a serious attempt to state a Clinician’s problem in psycho- 
metric terms has shown that his thinking is defensible even though it violates 
our beliefs. Clinical conclusions frequently are incorrect. But when a practicing 
psychologist believes in a conclusion which conflicts with accepted generaliza- 
tions in psychometrics, we need to look for built-in limitations in our theoretical 
model which may make it inconsistent with reality. The usual outcome of such 
scrutiny is neither to accept nor reject the practitioner’s conclusion as it 
stands, but rather to restate it precisely and establish the conditions under 
which it holds. 

Our restatement of Ockam’s problem has already illustrated how more 
comprehensive models improve thinking. Work with utility theory under our 
current ONR contract provides further instances and illustrates how psycho- 
metric results can grow out of clinical perpiexities. 

In industrial, clinical, and educational testing, grossly unreliable scores 
or differences between scores are used as bases for decision. Psychometric 
doctrine condemns this practice. It cannot be dismissed outright, however, 
because there are too many instances where critical and able testers find it 
profitable. No skilled Binet tester, for instance, would ignore bizarre answers 
on single items or signs of undue dependence on examiner approval, although 
these indicators are unreliable. If careful Clinical testers use indicators which 
our theory condemns, could our theory be wrong? 

An answer was first suggested by Shannon’s information theory. That 
theory points out that a communication device—and a test is one—can be 
evaluated in, terms of bandwidth and fidelity. Hitherto, only fidelity, i.e. 
accuracy, has concerned psvchometrics. Bandwidth refers to the number of 
messages a channel carries, or the number of questions a message can answer. 
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Both bandwidth and fidelity are desirable. According to Shannon, we increase 
one at the expense of the other. A homogeneous, Guttman-type test yields 
high precision, but covers little ground. The interview, the essay-test, and the 
projective test are wideband devices: great breadth, little precision. Applying 
a decision-theory or utility model to such concepts has permitted rigorous 
analysis of a large number of problems. (This work has been performed under 
contract N6ori-07146 with the Office of Naval Research.) 

Should indicators of low validity be used as a basis for decisions? Restate 
that in these terms: If you have a fixed testing time, would it be better to use 
many short tests covering different dimensions, or one accurate test? To 
answer this question, our theory requires that you specify the mathematical 
nature of the decision process. To arrive at one illustrative result, let us specify 
that a counselor is going to make a large number of decisions about a high- 
school student: decisions about courses, about remedial help, about disciplin- 
ary treatment, etc. Each is a Yes-No decision. Test questions bearing on each 
decision are available. Detailed assumptions are required. 

Specifically, we assume that the utility of assigning a person to any treat- 
ment is linearly related to his score on some criterion scale, that this criterion 
and the test have a normal bivariate distribution, and that the proportion of 
persons assigned to a treatment is the same for each decision. For fixed length 
of test, tests have uniform reliability, uniform validity against their own 
criterion, and zero validity against others. The treatment under consideration 
is specified before the test is introduced; this procedure is to be distinguished 
from adaptive treatment assumptions, where the treatment is modified ac- 
cording to the adequacy of the test as a basis for classification. The problem 
can be solved for any alternative assumptions if these are unacceptable. 
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FIGURE 1 
Change in Validity as Test Is Lengthened [After (4)] 


Figure 1 shows the change in validity as a test is lengthened. [Values 
assumed for validity and reliability of the unit test (k = 1) are .10 and .30, 
respectively.] In the decision problem assumed, utility is proportional to 
validity (cf. 1). If our total testing time is used for one test ten units long, the 
gain from the test is shown at point a. Here we have good information for one 
decision, but the others are made on a chance basis. If two 5-unit tests bearing 
on two decisions are used, each one contributes the utility shown at k = 5 
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on the scale (b). Hence if n equal tests are given, such that their combined 
length is ten units long, the utility of the combined tests follows the upper 
curve in Figure 2. If this represented the true situation, we would find it profit- 
able to give a separate test for each decision no matter how short the test is. 
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FIGuRE 2 
Combined Utility of Tests of Fixed Total Length 








But cost must be considered. Assume an initial cost ¢c) of setting up a 
test, giving instructions, etc. Beyond that, assume that the cost of giving a 
test increases by c, for each unit increase in length. Then the straight line in 
Figure 2 shows the assumed cost of n equal tests of fixed total length. By 
subtracting cost from the top curve, we have the lower curve showing net 
utility from each set of tests. Figure 3 is similar, with a lower cost. There is a 
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best value of n which represents the number of variables the counselor in our 
problem should attempt to measure. 

We have established in principle that there is an optimal bandwidth, for 
any Cy and v,/r;, . The lower ¢ is, the greater the optimum bandwidth. It is 
worth sacrificing fidelity to attain this bandwidth. This evidence tends to 
justify the Clinician’s reliance on the interview, for example. An interview 
gives fallible answers, but it compensates by increased bandwidth. Our results 
also post the warning that it is unwise to cover too many questions too hastily. 
Bandwidth can be too wide. With our general formulation, we can ask in any 
given decision problem exactly how far one should sacrifice fidelity. We end, 
not by condemning the procedure of the practical psychologist, but only by 
specifying its limits. We can accept his general aim and indicate how he can 
best attain it. 


Permit me, in closing, to spell out the obvious moral: Psychometric 
missions to Clinicia must continue. We should, as friends of psychology, offer 
every contribution we can to improvement in Clinical research and practice. 
But the far more commanding reason for working with Clinicia is that we gain 
thereby a deeper mastery of those arts for which the Psychometric Society 
stands. 
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AN EMPIRICAL COMPARISON OF RESTRICTED AND 
GENERAL LATENT DISTANCE ANALYSIS* 


Davin G. Hays 


LABORATORY OF SOCIAL RELATIONS 
HARVARD UNIVERSITY 


AND 


Epagar F. BorGatra 


RUSSELL SAGE FOUNDATION 


Latent distance analysis provides a probability model for the non-perfect 
Guttman scale; the restricted latent distance structure is simpler to compute 
than the general structure. Since no sampling theory for latent structure 
analysis is available, the advantages of the general structure cannot be ex- 
pressed formally. The two structures are compared in terms of their fit to fif- 
teen sets of empirical data. The computation schemes used are summarized. 


1. Introduction 


In the last decade, Paul F. Lazarsfeld and his students have developedg 
the area known as latent structure analysis. Much of this work has not been 
published formally; the most complete source is in Rand reports (2), but this 
is not generally accessible. Measurement and Prediction (3) is a relatively early 
treatment, but the only one generally available. 

In Measurement and Prediction, the special case called latent distance anal- 
ysis is treated, and its relationship to Guttman (cumulative) scaling is de- 
scribed. A Guttman scale is said to exist if, given N dichotomized items ordered 
by frequency of positive response, each respondent answering the 7th question 
positively also responds positively to the 7 — 1 items with greater positive 
response frequencies. Thus, all of the data can be reproduced, given only one 
quantity, or parameter, for each item: its positive frequency. 

The researcher who chooses a cumulative model for scale analysis will 
be most satisfied if the one-parameter Guttman model describes his data 

*This research, carried out at Harvard Laboratory of Social Relations, was supported 
in part by the United States Air Force under Contract AF33 (038)-12782 smaivened “ the 
Human Resources Research Institute. Permission is granted for reproduction, translation, 
publication and disposal in whole and in part by or for the United States Government. We 
are grateful to Paul F. Lazarsfeld for allowing us to draw from his work and to Frederick 


C. Mosteller for reading an earlier version of the manuscript and offering valuable sugges- 
tions. 
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adequately. If it does not, he may wish to use one of the latent distance 
models. Guttman scales rarely are found, and Guttman’s criteria for dealing 
with imperfect scales are not entirely clear. Latent distance analysis offers a 
rationale for the ‘‘almost”’ cumulative scale. The number of parameters which 
must be known in order to reproduce the data is increased to two, in the case 
of the restricted model, or to three, in the case of the general model. 

Both the restricted and general models entail considerable computation. 
Our purpose in this paper is to assist the researcher in determining which of 
the two latent structures to compute in a given empirical situation. A general 
mathematical solution of this problem is perhaps possible, but as yet, none 
has been discovered. Therefore we present the results of simultaneous compu- 
tation of both latent distance models on a sample of fifteen non-perfect 


Guttman scales. 
2. Computing Scheme 


The restricted latent distance model is described in (3, pp. 441-446); the 
general model is described in (2). Since the latter source is not readily avail- 
able, and since new, parallel solutions of both models have recently been 
developed, we briefly describe the manner of computation of both models. A 
complete account of these new solutions will be contained in a monograph now 
in preparation by Lazarsfeld for the Rand Corporation. 

The parameters of each item are, in the general case, denoted by a; , b; , c; . 
These are related to the first-order positive frequencies by the equation 


p: = (a; — bie; + (a; + B,)(1 — ¢,). (1) 


The cross-products of the empirical data are defined by 


[9] = Dis — DD; - (2) 
The latent parameters are related to the cross-products by the equation 
[47] = 4b,b,e,(1 — ¢;). (3) 


A third-order symmetric parameter is also defined: 

[ijk] = Pi pi [jk] ea p; [tk] — PrPii - (4) 
In the restricted case, it is assumed that a; = 3 for all 7, and the third-order 
symmetric parameters are not needed for computation of the latent param- 


eters. 
Because of the cumulative nature of the model, it is necessary to order 


the items of the scale in terms of the latent parameter c, in such a way that 


0. ie Re * (5) 


The ordering must be done before the latent parameters are known; but it is 
usually true that the proper ordering is obtained by having (if the p; are not 
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in the same neighborhood) 
<M Ko Cy < +++ CP. (6) 


Since data available are obtained from samples, they contain random 
error. Furthermore, the mathematical model suggested is, at most, expected 
to approximate the process by which the data were generated. Hence an 
estimation process must initiate computations. 

When the cross-products are arranged in matrix form, the model requires 
that all second-order minors (tetrads) having all their elements on or above the 
principal diagonal must vanish. This permits a preliminary test of the latent 
distance model; furthermore, it enables estimation of the diagonal cross- 
products, [77]. Each such element, [77], is included in one or more minors of the 
form described. (Note that [11] and [nn] are exceptions.) With fallible data, 
each different minor vanishes for a slightly different value of [77]. These differ- 
ent values of [jj] are averaged to give an estimate of its value. 

For the general model, the ratio 


u, = (7) 


is computed for all pairs of indices 7, k. Since U; should be constant, all values 
found are averaged to estimate the value of the ratio. 

Computation of the latent parameters is now possible. The following 
results are stated without proof. In the restricted case, 














b; = V [iil ~~ gil — gp) +> 3; (8) 
_——s 
a= AB +4. (9) 
For the general case, 
a; = p; + 3U; ; (10) 
b; = Vii] — pill — p,) + 32a; — 1)U, +4, — a); (11) 
— —" -e (12) 


The first and last items of the scale require special attention. The ratio 
U; cannot be computed in the general case; hence the restricted model is 
always applied to these two items in any scale. The estimating procedure for 
the diagonal cross-products is indeterminate, since the first and last cross 
products on the diagonal are not contained in any of the second-order minors 
described. But an alternative computation is possible after the parameters of 
all other items have been computed. The product b,c, can be computed by 
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means of the equation 





b= TEAS) ia 
Using this product, 
b, = 3(2p, + 4b,c, — 1). (14) 
For the parameters of the last item, we compute 
WU 6) = (15) 
b, = $[1 — 2p, + 4b,(1 — c,)]. (16) 


When all of the latent parameters have been computed, they are in turn 
used to compute the set of ultimate frequencies; that is, the expected fre- 
quencies of occurrence of the various possible response patterns. 

The parameters c; are ordered from least to greatest; then, assuming 
Co = 0, ¢,41 = 1, the quantities 


Ci +1 pees C; = G; (17) 


are computed. Each response pattern consists of n symbols, S; . These symbols 
are ordered, and a variable defined such that 


Y,, =a, + 6,, S; = + andi $7; 

Y,, =a, — b;, S; = + andj <i; 
(18) 

Y, =1l-—a;,-6,, S; = — andi 3 j; 

Y,; = l—a,+0,, S; = — andj <1. 

Then the expected frequency of a response pattern is 
E(S,S, --- 8.) = )0G; [] Yi; . (19) 
7=0 i=1 


This completes the latent structure computations. 


3. Empirical Comparison 


Twenty sets of data were obtained from research projects in progress in 
the Laboratory. The data were obtained as responses to attitude items; the 
two principal sources were a study of value-attitudes of high-school boys, and 
of attitudes of Air Force personnel. Five of these scales were rejected: when it 
was discovered that, with the computing procedures then available, nonsense 
results were obtained in one or more respects. Neither model was favored by 
this omission. The manifest data of the fifteen remaining scales are presented 
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in Tables 1-A and 1-B. Ten scales comprise four items each, and five scales 

five items each. The reproducibility (defined by Guttman) is denoted by R. 
The second-order minors of the cross-product matrices of all these sets of 

data appeared to vanish satisfactorily. The largest discovered was .0032. 


TABLE 1A 
Manifest Data: Four Item Scales 





Scales 








N 104 205 205 203 210 19 200 203 207 100C 
P, .8846 .7220 .€585 .7586 .7667 .7716 .7550 .7291 .7729 .5710 
P .5759 ,6244 .4976 .6108 .6000 .4569 .4550 .5369 .5507 .4310 
4038 ,3805 .3317 .4187 .4619 .3959 .3350 .2857 .3913 .3070 
Pg = .1154 .3122 .2780 .2660 .2095 .1726 .2050 .1970 .1691 .1380 
R +9591 .9207 .9329 .9360 .9286 .9404 .9438 .9347 .9360 .9403 











Table 2 presents the estimated latent parameters for both the general and 
the restricted cases. Where probabilities slightly greater than unity, or less 
than zero, occur (presumably because of sampling error or poorness of fit of 
the model), arbitrary values of zero or one were substituted in further com- 
putation. While comparison of the estimated parameters themselves does not 
give information concerning the goodness of fit of the model directly, readers 
will be interested in the differences between the models. 

The expected frequencies of the response patterns were computed. Sum- 
mary statistics describing the differences between the frequencies predicted 
by the model and the frequencies observed are given in Table 3. In thirteen 
of the fifteen cases, the sum of the absolute differences between manifest and 
expected frequencies is smaller in the case of the general model. The sum of 








276 PSYCHOMETRIKA 


TABLE !1B 
Manifest Data: Five Item Scales 





Scales 








N 306 293 390 301 175 
P} -7810 -9420 -8026 -8605 -7486 
P, -6667 - 7133 -6949 6645 -6343 
P; 4510 “5119 5692 4153 -5200 
P, -2092 2765 -3923 2425 -4000 
_ Ps .0654 -1638 - 2897 1429 -3200 
R -9758 -9536 -9482 -9821 -9691 








squared differences will allow the reader to observe whether the discrepancies 
are few or many, i.e., the upper limit of the sum of squared differences occurs 
when the total difference is in a single cell, the lower limit when the differences 
are uniformly distributed over all response patterns. The sum of ratios of 
squared differences to expected frequencies allows the reader to observe 
whether the discrepancies occur primarily in cells with large or small entries. 
The last two measures were plotted for the two models against the sum of the 
absolute differences, and no apparent differences were indicated in the inter- 
section points or slopes of the lines of the two models, or in the distribution of 
points about the lines. The reader will be aware that the three measures just 
described are not comparable. Separate comparisons of the models are sug- 
gested, utilizing each measure independently. 

The ratio of the sum of absolute differences between fitted and observed 
frequencies was taken. The ratios ranged from 0.92 to 12.56. The median ratio 
was 1.75. Thus the efficiency of the general model tends to be about twice that 
of the restricted model according to this criterion. 
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TABLE 3 


Statistics Describing Differences Between 


Theoretical Distributions and Manifest Data 












































Gene ral Restricted General jRestricted General|Restricted 
Scale) N | o-rl| lo-rl S$] (or? (o-r) Lo-r)* (o-ry 
1 104 12.00 21.07 1,75 19.10 62,58 4.28 7.28 
2 205 13.44 38.75 2.88 17.23 | 177,80 1,67 11,88 
3 205 25.50 27.45 1,08 53.96 59.63 13.54 15.46 
. 203 36,11 42.89 1.18] 157.87 | 165.28 17.70 20.25 
5 210 18,22 31.28 1,72 39.41 122.78 3.95 11.98 
6 197 47.78 44.60 -94] 456.86 | 256.58 41.57 31.15 
7 200 22.07 46,84 2.12 45.83 | 264.16 10,97 20.04 
8 203 10,35 14,95 1.45 10.58 17.48 2.65 6.22 
9 207 23.93 26.03 1,09 65.08 67.91 9.80 10.86 
10 = ©}1000 55.12 50.92 -92 279.56 | 227.97 12.92 17,71 
1 306 14.87 | 186,85 12.56 15.86 | 5757.49 6.06 | 141,37 
12 293 71.82 | 186.70 2.60] 374,67 | 3444.88 36.06 | 320.95 
13 390 58.44 | 260.63 4.48] 205,68 | 9308.64 43.42 | 356,41 
14 301 28.35 53.75 1,92 56.89 | 259,04 56.56 60.16 
15 175 15.74 119.63 7.59 15.32 | 2727.76 48.92 | 864.95 
Mdn = 1,75 





Code: O= Observed (Manifest) data. 


WV = Frequencies computed using estimated parameters. 


4. Discussion 


The restricted model, as we have seen, predicts the ultimate response 
frequencies on the basis of data of orders zero, one, and two. The general model 
also utilizes third-order data. There are 2” ultimate frequencies in a dichoto- 
mous system, or 2” independent positive frequencies. The number of data of 
zero, first, second, and third order, respectively, are 1, n, n(n — 1)/2, 
n(n — 1)(n — 2)/6. For four items, the data up to second order comprise 11 
entries. Up to third order, 15 entries are included. There is only one datum of 
fourth order. Thus the two models are, for four items, estimating 16 entries 
from 11 or 15 entries, respectively, and 15/11 = 1.36. For five items, the ratio 
becomes 1.63. On this basis, we could expect a fit about one and a half times as 
good with the general model as with the restricted model. 

On the other hand, the positive frequencies are averaged together in the 
fitting procedure. Thus the general model might not be so advantageous as 
predicted above. The number of latent parameters free and fitted in the general 
case is 3n — 2. The number for the restricted case is only 2n. The ratio of 
these quantities is, for four items, 1.25; for five items, 1.30. 

The median ratio of goodness of fit, noted above, was 1.75. Thus the 
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general model seems to be as advantageous as could be expected. The question 
now arises whether this benefit is sufficient to justify the extra computation 
involved. The answer depends on the requirements of the researcher. In the 
first place, it must be clear that the time required for computation of the three- 
parameter case is not great compared to other time costs in research, such as 
gathering the data. Hence, if the researcher is preparing a final report, he will 
certainly turn to the three-parameter model. 

If the researcher is engaged in refining a scale, however, he may use latent. 
distance analysis to eliminate items which do not correlate from the scale. The 
researcher will presumably use the general model in this case, also. However, 
it must be recognized that other approaches to the elimination of poor items 
include many more economical than latent structure analysis. 

In an earlier stage of research, the restricted case may be preferable. 
Cumulative scales are used to order respondents. If a preliminary ordering is 
required, even though the scale is to be revised later, a latent distance analysis 
provides a basis for ordering respondents. It has even been suggested (1) that 
the latent distance ordering may be approximated sufficiently without com- 
putation of latent parameters. 

A researcher who has made the initial decision to use Guttman or latent 
distance scale analysis will begin by finding a Guttman scale or approximation. 
He will perhaps use the technique of (1) in an early stage of research. For most 
accurate results, however, the general model appears to give, on the basis of 
empirical study, a consistent advantage in goodness of fit. 
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THE CONCEPT OF PARSIMONY IN FACTOR ANALYSIS 
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MCGILL UNIVERSITY 


The concept of parsimony in factor analysis is discussed. Arguments are 
advanced to show that this concept bears an analogic relationship to entropy 
in statistical mechanics and information in communication theory. A formal 
explication of the term parsimony is proposed which suggests approaches to 
the final resolution of the rotational problem. This paper provides the rationale 
underlying Carroll’s (2) analytical solution for approximating simple struc- 
ture, and the solutions of Saunders (7) and Neuhaus and Wrigley (5). 


Introduction 


A vague and ill-defined concept of parsimony has been implicit in the 
thinking of factorists since Spearman first proposed his theory of two factors. 
That theory was overly parsimonious. It implied that human ability could be 
described in a simpler way than subsequent experimental evidence confirmed. 
The parsimony concept is widespread in the thinking of Thurstone (9) both 
in relation to science in general and factor analysis in particular. Commenting 
on the nature of science, he writes, “It is the faith of all science that an un- 
limited number of phenomena can be comprehended in terms of a limited 
number of concepts or ideal constructs. Without this faith no science could 
ever have any motivation.’”” Commenting on the factor problem, he writes, 
“Tn a factor problem one is concerned about how to account for the observed 
correlations among all the variables in terms of the smallest number of factors 
and with the smallest possible residual error.’’ Again the essential feature of 
Thurstone’s simple structure concept is parsimony. He refers to simple struc- 
ture as “the simple idea of finding the smallest number of parameters for 
describing each test.’’ Such views as these have been widely accepted by 
factorists whose major concern has been the quest for a simple way to define 
and describe the abilities of man. Few will disagree with the statement that 
the object of factor analysis is to attain a parsimonious definition of factors, 
which obversely provides a parsimonious description of test variables. 

Questions can be raised as to the meaning of the term parsimony in par- 
ticular contexts. Consider, first, the use of the term in relation to scientific 
theory. Many scientific theories, although not all, are comprised of a number 
of primitive postulates and the deductive consequences of those postulates, 
called theorems, which in turn are presumed to be isomorphic within certain 
probabilistic limits with a class of observable phenomena. One theory can 
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be said to be more parsimonious than another when it contains fewer primitive 
postulates and yet accounts equally well for the same class of phenomena. 
In this context the term law of parsimony is occasionally used, the law stating, 
rather loosely, that no more postulates than necessary be assumed to account 
for some class of events. 

Consider now the meanings which may attach to the term parsimony in 
factor analysis. In relation to the problem of the number of factors the term 
has a precise meaning. Parsimony is defined simply and explicitly in terms of 
the number of factors required to account for the observed correlations. If we 
account for the intercorrelations in terms of the smallest number of factors, 
given restrictions imposed by the rationale of the factor problem in general, 
then such a solution is the most parsimonious one as far as the number of 
factors is concerned. This assumes, of course, that we know when to stop 
factoring. Knowing this the meaning of the term parsimony in this context is 
precise and unambiguous. None the less, parsimony in this sense is a concept 
of limited usefulness; it is a discrete concept and insufficiently general. It will 
be shown to be subsumed under a more general formulation. 

The meaning of the term parsimony in relation to the rotational problem 
is neither explicit nor precise. None the less, factorists have implicitly made 
use of a vague and intuitive concept of parsimony as a basis for preferring one 
possible solution to another. Rotation to maximize the number of zero loadings 
in the rows and columns of the factor matrix yields a definition of factors, and 
an obverse description of tests, which is intuitively more parsimonious than 
many other possible solutions. Parsimony is basic as mentioned above to the 
simple-structure idea. The use of oblique in preference to orthogonal axes is 
justified on the grounds of parsimony of definition and description. When 
what appears to be an intuitively simple solution can be obtained on rotation, 
that solution appears to be particularly compelling. Regardless of the impre- 
cision of this statement many who have worked with real data will feel that 
they understand quite well what the statement means. 

Since factorists in dealing with the rotational problem employ an intuitive 
concept of parsimony, the question can be raised as to whether the term can 
be assigned a precise and more explicit meaning, which will enable the rota- 
tional problem to be more clearly stated and admit the possibility of a unique 
and objective solution. This involves explication in the Carnap sense. Carnap 
has used the term explication to refer to the process of assigning to a vague, 
ill-defined, and perhaps largely intuitive concept a precise, explicit and formal 
meaning, the old and the new concepts being referred to as the exrplicandum 
and the erplicatum respectively. In the present context we are given the expli- 
candum, the vague concept of parsimony seemingly implicit in existing intui- 
tive-graphical methods of rotation. The problem is to formally define the 
explicatum. If in this context the term parsimony can be formally explicated 
in such a way that statements respecting the amount of parsimony in any 
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solution are possible then we may proceed to develop a rotational procedure 
which will maximize the amount of parsimony in the final solution. Presum- 
ably, depending on the nature of the explication made, a correspondence 
should exist between solutions obtained by existing intuitive-graphical meth- 
ods, and those obtained by such a possible precise objective method, a cir- 
cumstance resulting from the fact that the explicandum and the explicatum 
are epistemically correlated. The concept of epistemic correlation is used here 
in the sense proposed by F. C. 8. Northrop (6). 

It may be argued that Thurstone’s concept of simple structure reduces 
to an attempt to render explicit the concept of parsimony in factor analysis. 
To a certain extent this argument must be accepted. The simple-structure 
concept, however, while it is clearly in the direction of the required explication, 
has a number of disadvantages. First, Thurstone’s (9) conditions for simple 
structure cannot, in my opinion, be represented as terms in a mathematical 
expression capable of manipulation. Such an approach would probably prove 
intractable. Second, the concept is expressed in discrete and not continuous 
terms. The conditions pertaining to the number of zeros in the rows and col- 
umns of a factor matrix, and the like, are discrete conditions. The concept of 
complexity of a test is a discrete and not a continuous concept. Thurstone 
regards the complexity of a test as the number of parameters which it involves 
and which it shares with one or more other tests in a battery. Thus, if a test 
has loadings regarded as significant on three factors its complexity is 3, on 
two factors its complexity is 2, and so on. Two tests may, however, have the 
same complexity in the Thurstone sense and yet may appear to differ markedly 
in complexity, or its opposite parsimony, in some intuitive sense. To illustrate, 
consider two tests, the first with loadings .20, .20, and .82 on three factors, and 
the second with loadings .50, .50 and .50, on the same three factors. The load- 
ings of the first test appear intuitively to serve as a less complex and more 
parsimonious description than the loadings of the second test. Presumably we 
require a formal definition of complexity in continuous terms which will reflect 
such differences as these. Third, because the simple-structure concept involves 
discrete formulations, it is insufficiently general. It can be shown that the 
simple-structure case is a particular case, indeed a most important particular 
case, of a more general formulation. Reasons for arguing in the above fashion 
will become clear in what follows. 


Entropy and the Factor Problem 


It may appear farfetched that the concept of entropy in statistical me- 
chanics or the concept of information employed by Shannon (8) in communi- 
cation theory should have direct relevance to the rotational problem. An 
attempt will be made to demonstrate that concepts of this type can be used 
in reformulating the rotational problem in such a way as to admit the possi- 
bility of an acceptable objective solution. 
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Boltzmann’s formulation of entropy in statistical mechanics and Shan- 
non’s formulation of information in communication theory are both measures 
of structure or organization. The expression for entropy is an index of the 
degree of organization, or obversely the degree of disorder, existing in an 
assembly of atomic particles. Likewise Shannon’s expression for information 
can be regarded as a measure of freedom of choice in constructing messages. 
If there is little freedom of choice, and the probability of certainty is high, the 
situation is highly structured or organized. If there is a high degree of freedom 
of choice, and the probability of certainty is low, the situation is highly dis- 
organized, or characterized by a high degree of randomness. Both in statistical 
mechanics and communication theory we visualize a continuum of possible 
states extending from a state of chaos, disorder, or complete randomness at 
one extreme to a state of maximal organization or structure at the other 
extreme, structure or organization being viewed as a progression from a state 
of randomness, and randomness as the obverse regression. 

The basic concepts underlying measures of entropy and information, and 
the measures themselves, are very general and are applicable in many situa- 
tions distantly removed from statistical mechanics and information theory. 
For example, in a simple learning situation certain types of alternative re- 
sponse patterns become more probable and others less probable with repeti- 
tion. The response pattern becomes more highly organized or structured, and 
Shannon’s measure, or some similar measure, can be used as a statistic descrip- 
tive of the degree of organization attained at different stages of learning. Learn- 
ing and forgetting in this context are viewed respectively as a progression 
from and a regression towards randomness of response. 

Shannon’s (8) definition of information takes the form, 


H = — )'p, logp, . (1) 
This expression assumes a set of alternative events with associated probabili- 
ties, Pp: , Po, °°: , p, - H is a measure of uncertainty as to what will happen. 


The information contributed by a particular event of probability p; is —log p; . 
Shannon and others have pointed out that this is a form of the measure for 
entropy used in statistical mechanics. Measures of entropy or information 
may be otherwise defined. The bases for preferring one measure to another 
reside in the properties we wish to attach to our measure for particular pur- 
poses, and in the ability of the measure to satisfy certain intuitions. An alter- 
native measure which has been proposed takes the form 


D=1- > 7. (2) 


This measure has been discussed by Ferguson (4), Bechtoldt (1), and Cron- 
bach (3). D may be used in certain situations as an alternative to H. Its proper- 
ties are somewhat different from those of H, and although simpler and easier 
to calculate may be less amenable to certain forms of mathematical manipula- 
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tion. If we imagine a continuum of all possible configurations of vectors in r 
dimensions we have at the one extreme a chaotic, disordered, or strictly ran- 
dom configuration, within certain boundaries, and at the other a highly 
organized or structured configuration, the maximal degree of structuring pos- 
sible being an overdetermined simple structure of complexity 1, in the 
Thurstone sense, for each test. This situation is directly analogous to situa- 
tions in statistical mechanics or communication theory, except here we are not 
concerned with an assembly of particles or a communication source, but 
rather with a vector model which is the geometric image of the intercorrela- 
tions. In passing, it appears quite possible to employ usefully vector models in 
communication theory. The problem now arises as to how a measure of the 
degree of randomness or degree of organization characteristic of a configura- 
tion of vectors can be defined. Conceptually such a measure is the direct 
analogue of measures of entropy or information. In this context it will be 
referred to as a measure of parsimony. 

To speak of the parsimony or entropy of a configuration of vectors we 
must first describe the configuration. This is done by inserting reference axes 
in any arbitrarily chosen position. A factor matrix is thereby obtained. Such 
a matrix does not describe the structural properties of the configuration unless 
its location is determined by those structural properties themselves and by no 
other circumstance. This is a crucial idea in Thurstone’s simple-structure 
concept, but has seemingly been misunderstood by some factorists. Presum- 
ably, for any configuration, a location of the reference axes can be found which, 
in a least-squares or other acceptable sense, best describes the structural 
properties of the configuration. The problem of structure and its description 
is implicit in many statistical problems, as, for example, in fitting a curve to a 
set of points. The curve-fitting problem differs, however, from the factor 
problem in one important respect. In curve fitting the structural properties of 
sets of points are reflected in certain invariant properties of mathematical 
expressions. In the factor case, the structural properties of the configuration 
are described and communicate themselves to our understanding by the choice 
of location of axes. 


On Measures of Parsimony 


The problem of assigning to the term parsimony a precise mathematical 
meaning may be variously approached. One approach is the following. Con- 
sider a point, P, plotted in relation to two orthogonal reference axes. These 
axes may be rotated into an indefinitely large number of positions, yielding, 
thereby, an indefinitely large number of descriptions of the location of the 
point. Which position provides the most parsimonious description? Intuitively 
it appears that the most parsimonious description will result when one or other 
of the axes passes through the point, the point being, thereby, described as a 
distance measured from an origin. Likewise as one or other of the reference 
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axes is rotated in the direction of the point the product of the two coordinates 
grows smaller. This product is a minimum and equal to zero when one of the 
axes passes through the point, and is a maximum when a line from the origin 
and passing through the point is at a 45° angle to both axes. These circum- 
stances suggest that some function of the product of the two coordinates might 
be used as a measure of the amount of parsimony associated with the descrip- 
tion of the point. Note, further, that this product is an area, and is a measure 
of the departure of the point from the reference frame. For any set of collinear 
points an identical argument holds. In this case some function of the sum of 
the products of the coordinates might serve as a measure of parsimony. 

Consider now any set of points with both positive and negative coordi- 
nates, the usual factor case. Here the sum of products is inappropriate because 
a zero sum can result when the sums of negative and positive products are 
equal. Here following the usual statistical convention we may use the sum of 
squares of products of coordinates, or some related quantity, as a measure of 
parsimony. In the general case of r dimensions we are led to a consideration 
of the r(r — 1)/2 sums of pairs of coordinates. Thus given n tests and r 
factors, where a;,, is a loading of factor m on test 7, the quantity under con- 
sideration may be written as 


mn #r(r-1)/2 


“a p> (GjmAjx) ; (3) 
Now it may be readily shown that 


n r(r—-1)/2 


>» h; — ‘¥ es Gin + 2 Z. > (A;mAjx)’, (4) 


where h; is the communability of the jth test. 

Inspection of this expression indicates that as one of the terms to the right 
of the expression increases the other decreases. Thus any procedure which 
maximizes one of the terms will minimize the other, and vice versa. Either 
term, or some function of either term, could serve as the measure required. 
Let us define a measure of parsimony as 


r= Da. (5) 


This may be spoken of as the parsimony of a factor structure or a factor 
matrix. The quantity / will vary depending on the location of the reference 
frame. When the reference frame is located in order to maximize J, the most 
parsimonious solution, the factor matrix will describe in a least-squares sense 
the structural properties of the configuration, and the maximum value of J 
attainable for any set of data will be spoken of as the parsimony of the test 
configuration. In this case J is a measure of the degree of structure or organiza- 
tion which characterizes the configuration, and as such is analogous to a meas- 
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ure of entropy or information. J has its maximum value for n tests and r 
factors when the solution is an overdetermined simple structure with com- 
plexity 1, in Thurstone’s sense, for all tests. This presumably is the maximum 
degree of structure or organization possible for n tests and r factors. 

The parsimony of a single test may be defined simply as 


I= Dole (6) 


that is, the sum of the squares of the factor variances for the test. This formu- 
lation is an extension of Thurstone’s concept of complexity. Observe that the 
formulation takes into account the question of the number of factors. Values 
of J; are directly comparable one with another regardless of the number of 
factors involved. Circumstances may arise where the degree of parsimony of a 
test with loadings on r factors is greater than that for a test with loadings on 
a smaller number of factors. Thus the parsimony of a test with factor variances 
.70, .15, .15 on three factors is greater than that for a test with variances .50 
and .50 on two factors. The former is in the present sense a more parsimonious 
factorial description than the latter, regardless of the number of factors in- 
volved. Our approach here suggests certain lines of thought relevant to the 
problem of the number of factors required to account for the intercorrelations. 
Presumably we do not, of necessity, set out to account for the intercorrelations 
in terms of the smallest number of factors, but rather in the most parsimonious 
way possible. Thus questions of parsimony may be presumed to subsume 
questions regarding the number of factors. This line of thought requires 
further development. We may note that J; is the direct analogue of the term 
>. p’ in the previously mentioned statistic, D = 1 — >> p; , which in certain 
situations may be used as an alternative to Shannon’s measure. J; is simply 
the sum of the squares of variances, and these variances are in effect propor- 
tions. 

The measure of parsimony proposed in this paper is only one of many 
possible measures. It is not improbable that some modification of the present 
measure will ultimately be preferred. Another obvious approach is to define 
parsimony or its opposite complexity, as some logarithmic function of the 
factor variances. Thus the parsimony of a test might be defined as 


— dai, log a}, , (7) 
and that of a factor matrix as 
_ bm > a},, log a3, . (8) 


The reader will of course observe the direct analogic relationship between 
these expressions and Shannon’s H. The statistic ultimately preferred will 
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depend on the properties belonging to that statistic in the factor context and 
on certain intuitions being satisfied. 

Some advantage may attach to the definition of a statistic whereby the 
parsimony of one factor matrix can be directly compared with that of another. 
We may require that such a statistic take values ranging from 0 to 1. One 
such statistic is of the form 


nor 
DL ain 
—— a _ = . 


n 


(9) 


This statistic is the average test parsimony, and, theoretically, may take 
values ranging from 0 to 1. A variety of other related statistics may readily be 
defined. 

The above discussion relates to the orthogonal case. Presumably the 
arguments can be generalized to deal with the oblique case. Increased parsi- 
mony of definition and description is of course, and always has been, the 
primary purpose underlying the use of oblique axes. 


Analytical Solutions 


Consider now the question of developing a rotational method to yield 
the most parsimonious solution in the sense in which the term has been 
explicated here. This problem has recently been investigated by Carroll (2), 
Saunders (7), and Neuhaus and Wrigley (5). The work of Saunders, Neuhaus, 
and Wrigley came to the present author’s attention after the first draft of the 
present paper was prepared for publication, Carroll’s solution minimizes 
>>> (aind;x.). The methods of Saunders, Neuhaus, and Wrigley maximize 

ai, , & More convenient approach. The work of these various authors 
provides the necessary rotational procedures which follow directly from the 
rationale outlined in this paper. Carroll, Neuhaus and Wrigley were appar- 
ently led to their solutions by direct intuition resulting from experience with 
the factor problem. The present attempt was to develop a logical groundwork 
which would lead ultimately to an objective analytical solution and render 
explicit the meaning of such a solution. 

Carroll and others have shown with illustrative data that a correspon- 
dence exists between the most parsimonious analytical solution and the solu- 
tions reached by existing intuitive-graphical methods. Such correspondence is 
to be expected if the explication of the term parsimony is correlated, as it 
should be, with the intuitive concept of that term. Discrepancies between the 
analytical and intuitive-graphical solutions will of course occur. One reason 
for these discrepancies has to do with weights assigned to points in determining 
the solution. No precise statement can be made about weights assigned to 
points when an intuitive-graphical method of rotation is employed. Factorists 
intuitively attach somewhat more or less importance to some points than to 
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others, and arrive thereby at somewhat different solutions. In Carroll’s solu- 
tion in the orthogonal case the points are weighted in a manner proportional 
to the squares of the communalities. This means that a test with a communal- 
ity of .8 will exert four times the weight in determining the final solution than 
a test with a communality of .4. These weights are probably substantially 
different from those we intuitively assign when we employ a graphical method. 
Were we to apply Carroll’s method, or the methods of Saunders, Neuhaus and 
Wrigley, to normalized loadings we should in effect be assigning equal weight 
to each point. At any rate the problem of the discrepancy between the intui- 
tive-graphical and the analytical solution is probably a matter of weights and 
requires further study and clarification. It should be added that the weights 
assigned by these various methods are implicit in the definition of parsimony 
outlined in this paper, and a modification of the weighting system implies 
some modification in that definition. 

Quite recently Thurstone (10) has published an analytical method for 
obtaining simple structure. Thurstone employs a criterion function which is 
in effect a measure of parsimony, although differently defined from the meas- 
ures suggested in this paper. Thurstone attempts to deal with the question of 
weights, a system of weights being incorporated into the definition of the 
criterion function. No very precise rationale determines the selection of 
weights, these being rather arbitrary. None the less, the method appears to be 
quite successful and yields results which approximate closely to those obtained 
by the more laborious intuitive-graphical methods. 

Clearly much further work is required not only on the development and 
comparison of analytical solutions but also on the rationale underlying such 
solutions. We appear, however, to be rapidly approaching a time when an 
acceptable objective solution will be available which is computationally prac- 
tical and well grounded theoretically. A next step will be to proceed directly 
from the correlation matrix to the final solution without the computation of 
the intermediate centroid solution. I have pondered this problem briefly but 
with little enlightenment. 
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THE MAXIMUM EXPECTED CORRELATION BETWEEN 
TWO MULTIPLE-CHOICE TESTS 


Paut Horst 


UNIVERSITY OF WASHINGTON 


A formula is derived which gives the maximum expected correlation be- 
tween two multiple-choice tests as a function of the distributions of proportions 
correct for the items in the two tests and the probability of chance success. The 
formula is similar to one derived by Carroll based on ‘‘true’’ item difficulties. 
A numerical example is provided. 


It is well known that the maximum correlation which it is possible for any 
two tests to have is a function of the reliability of the two tests. It is also 
known that the reliability of a test is related to the probability of answering 
the items in it correctly by chance. The relationships involved have been 
considered by Carroll (1) and Plumlee (3). 

Ferguson (2) and Carroll (1) have also shown how the correlation between 
two tests is in part a function of the distribution of item difficulties in the 
tests. As a matter of fact, Carroll’s equation 37 expresses the maximum 
correlation between two tests, all of whose items measure a single ability, as a 
function of the ‘‘true”’ difficulties of the items and the probability of answering 
the items correctly by chance. Carroll’s formula is given in terms of item error 
scores and assumes that the probability of chance success is the same for all 
items in both tests. Most of the tests we deal with in psychological measure- 
ment are of the multiple-choice type involving chance success. In such cases, 
the interpretation of correlations among tests would be facilitated if we knew 
the maximum correlations to be expected when the distribution of item diffi- 
culties and the probability of chance success are known. 

Considering the fundamental importance of Carroll’s formula in the 
interpretation of intercorrelations among multiple-choice tests, it is surprising 
that it is not more generally used. Perhaps one of the reasons is that the 
formula is given in terms of the “true”’ difficulties rather than in terms of the 
observed proportions answering the items correctly. 

It is the purpose of this note to express the formula in terms of the ob- 
served proportions. As is customary, we shall assume that the score on a test 
is the total number of items answered correctly and that all persons attempt 
all items in the tests. We also assume that the probability of chance success is 
the same for all items. We may assume, as is customary, that this probability 
is the reciprocal of the number of choices for an item; although, as in the case 
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of the Carroll formula, our formula makes no assumption as to how the prob- 


ability of chance success is determined. 
We shall first define the notation to be used in the formula. We let 


number of items in test X, 


nx = 
ny = number of items in test Y, 

My, = the mean of test X, 

My = the mean of test Y, 

Px, = the proportion who answer item 7 of test X correctly, 

Py, = the proportion who answer item 7 of test Y correctly, 

C = the probability of answering an item correctly by chance, 

r, = the maximum expected correlation between tests X and Y assum- 


ing all items measure the same function. 


We shall assume that the P values for the items of test X and of test Y have been 

serially arranged in descending order of magnitude. Then we define }> 7Px, as 

the sum of the products of the P values for test X by the corresponding serial 

order. duP y, has the same interpretation for test Y. We also assume that the P 

values for both tests have been pooled and serially arranged in descending order 

of magnitude, so that > iPr, isinterpreted in terms of all items from both tests. 
The formula for maximum expected correlation then is 


_ (Mx — Cnx)(My — Cny) 


r= (1 - of x iP,, — >, iPx, — ) iPy, ‘ying 








- Cnn | / [Vad — C) > iPy, — Mx(1 + My — 2Cnx) — Cny(nx — 1) 


- 721 — C) & iPy, — My + My — 2Cny) — Cny(nmy — 1)]. (A) 
If we assume that the probability of chance success is zero, then formula A 
becomes simply 
>» 1P r, ali > iP. sy y 1Py, ‘eet MxM, (B) 
eg ee es 
V2 > iPx, — Mx(1 + My) V2 > iPy, — M1 + My) 
Formula B is equivalent to equation 17 given by Carroll (1). However, the latter 
formula is given in terms of proportion incorrect rather than proportion correct. 
To illustrate the use of Formula A, suppose we have 4 items in test X and 
3 items in test Y. Assume also that the probability of chance success is .20. 
Suppose that the proportions correct for the items are as follows: 











Text X Test Y Both tests 
.90 .80 .90 
7 .60 .80 
.50 .40 .70 
.30 .60 
.50 
.40 


.30 
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Then 
Mx = > Px, = 2.40, 
My = > Py, = 1.80, 
> iPx, = 5.00, >> iPy, = 3.20, >> iPr, = 14.00. 


Substituting in formula A we have 


— s| 14.00 ~ 5.00 — 3.9 — 2 = sme ae 2.10 | 7 


[V1.6 X 5.00 — 2.40(3.40 — 1.60) — 2.40 
- V1.6 X 3.20 — 1.80(2.80 — 1.20) — 1.20] 











or 
r= 1698. 


Suppose now that the probability of chance success is zero. Then formula B 
would apply. Substituting the data in (B) we have 
- 14.00 — 5.00 — 3.20 — 2.40 X 1.80 

V/10.00 — 2.40 X 3.40 V/6.40 — 1.80 X 2.80’ 











Tm 


or 


It is interesting to note that, with the given distributions of proportions 
correct for the items in each test, the maximum possible correlation is only 
.744 while the chance factor of .20 reduces the expected maximum by only .051. 


Proof of Formula A 


Suppose we let 


o7 = maximum variance of a test with a given distribution of item diffi- 
culties, assuming no chance successes, 

o> = Maximum expected variance of a test with a given distribution of 
item difficulties, assuming chance successes, 

n = number of items in the test, 

1 = a column vector with all unit elements, 

V = acolumn vector of the numbers from 1 to n, 

p =a column vector of proportions knowing the correct answers, ar- 

ranged in descending order of magnitude, 

= a column vector of proportions correct for the items arranged in 

descending order of magnitude, 

= the probability of chance success, 

M = the mean of the distribution of observed scores. 


er. 
| 
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Equivalent to Carroll’s equation 6 we have 
or = 2V'p — 1’p — (1'p)”. (1) 

Corresponding to Carroll’s equation 35 we have 
oo = (1 — C)[(1 — Chor + Cn — 1'p7)]. (2) 


It is well known (3) that the relationship between proportion correct and 
proportion knowing the answer may be expressed by 








P-C1 
eS he ®) 
Substituting (3) in (1) gives 
2 _ 24V’P—CV'l) I’P— Gn ( l’P — Gn) 
a i-c¢ i-o." aac r (4) 


But because of the definition of V it can readily be proved that 


V1= woe 5 (5) 
It is also obvious that 
1l’P = M. (6) 


Substituting (5) and (6) in (4) we get 


2 1 , (M Fe ‘& , 
= zty (avr - um - Man _ nt). (7) 
Substituting (3), (6), and (7) in (2) gives 
o, = 211 — C)V’P — M(1 + M — 2Cn) — Cn(n — 1). (8) 


Suppose now we let 

Vr = acolumn vector of the numbers from 1 to ny + ny, 

pr = acolumn vector of the pooled proportions from both tests knowing 
the correct answers, and arranged in descending order of magni- 
tude, 


P, = a column vector of the pooled proportions from both tests giving 
the correct answers and arranged in descending order of magnitude, 

G, = the maximum possible covariance between two tests with given 
distributions of item difficulties, 

G, = the maximum expected covariance between two tests with given 


distributions of item difficulties assuming chance success. 
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We have, corresponding to the numerator term of Carroll’s equation 17, 











Gr = Vrpr — Vxex — Vypy — (1’px)(prl). (9) 
Using (3) and (6) in (9) 
Gr = 1 4 a/ rPr — CV7l — VgPx + CVxl — Vypy + CVyl 
_ (Mx — CMx)(My - cmd | (10) 
1-C 
Because of the definition of V; we have 
Vid = Cxt nex + ry + 2D) , (11) 


2 
Substituting (5), with appropriate subscripts, and (11) in (10) we get after 
some algebraic manipulation 


Mx — Cnx)(My — Cny) 
1-C 





_— | ViPs — Vas — Vere — { 


a 
1-C 
= Cnn | (12) 


It can readily be deduced from Carroll’s equations 27 through 30 that 
G = (1-—C)’Gr. (13) 


Now the maximum expected correlation between two tests with given dis- 
tributions of proportions correct will be the ratio of the maximum expected 
covariance to the geometric mean of the maximum expected variances. There- 
fore we have 


Tn = li (14) 


FoxI%oy 
Substituting from (8), (12), and (13) in (14) 


= (Mx — Cnx)(My — Cny) 
i-€ 





rt. = (1 — of t7Pr — VxPx — VyPy 








- V1 — C)Vy~Py — My(1 + My — 2Cny) — Cny(ny — 1]. (15) 


Equation 15 is the same as Formula A except that the matrix notation is used 
to indicate product summations. 
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Fifty-three tests designed to measure aspects of creative thinking were 
administered to 410 air cadets and student officers. The scores were intercorre- 
lated and 16 factors were extracted. Orthogonal rotations resulted in 14 
identifiable factors, a doublet, and a residual. Nine previously identified factors 
were: verbal comprehension, numerical facility, perceptual speed, visualization, 
general reasoning, word fluency, associational fluency, ideational fluency, and a 
factor combining Thurstone’s closure I and J]. Five new factors were identi- 
fied as originality, redefinition, adaptive flexibility, spontaneous flexibility, and 
sensitivity to problems. 


This study is one of a series designed to explore abilities considered to be 
important in the success of high-level personnel. For reports of other studies 
see (5, 9). In this study an attempt was made to isolate and define abilities in 
the domain of creative thinking, particularly as it applies to science, engineer- 
ing, and invention. Lack of space prevents our reporting all phases of this 
study in detail here. For more detailed information see (7, 8, 16). 


*Under Contract N6onr-23810 with the Office of Naval Research. The opinions ex- 
pressed are our own and are not necessarily shared by the Office of Naval Research. These 
studies are under the direction of J. P. Guilford. Paul R. Christensen is assistant director. 
Robert C. Wilson has been principally responsible for the conduct of this particular study. 
Donald J. Lewis contributed to the development of hypotheses and tests. Raymond M. 
Berger made substantial contributions to the development of the tests. 

The authors are very much indebted to the Personnel Research Laboratory, Human 
Resources Research Center, Air Training Command, Lackland Air Force Base, Texas, for 
making the testing possible, and in particular to Dr. Lloyd G. Humphreys, Director, and 
to Mr. William B. Lecznar, Technical Aide. 

Acknowledgement is made to Gordon Taaffe for the supervision of much of the sta- 
tistical work connected with this study and to Norman W. Kettner for carrying out the 
extractions of factors and for valuable assistance on the rotations of axes. 

A fuller discussion of this factor analysis is given by Wilson (16). 
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Hypotheses and Tests for This Study 


As in other studies of this type, particularly when exploring a relatively 
unknown domain, considerable time is spent in hypothesizing concerning the 
factors to be expected in the domain and concerning their properties. This is 
necessary for the sake of deriving test ideas, which provide a comprehensive 
coverage of the domain and which provide enough tests of a kind to bring out 
each common factor. We thus expected as many as eight new common factors, 
which are listed by name in the following outline. Under each is given some of 
the variations or subsidiary qualities that might be important aspects of such 
a factor. 


I. Sensitivity to problems 
A. Seeing defects, needs, deficiencies 
B. Seeing the odd, the unusual, etc. 
C. Seeing what needs to be done 


II. Fluency 
A. With simple restrictions and limited potential 
B. With simple restrictions and large potential 
C. With complex restrictions and limited potential 
D. With complex restrictions and large potential 


III. Flexibility 
A. Adaptability to changing instructions 
B. Freedom from inertia of thought 
C. Spontaneous shifting of set 


; 


IV. Originality 
A. Production of uncommon responses 
B. Production of remote, unusual, unconventional associations 
C. Cleverness 
V. Penetration 
A. Seeing remote implications 
VI. Analysis : 
A. Perceptual analysis of perceived objects 
B. Conceptual analysis of verbal material 
VII. Synthesis 
A. Production of perceived objects (Thurstone’s closure-I factor) 
B. Production of conceptual objects 
C. Production of logical or meaningful order 
VIII. Redefinition 


A. Perceptual reorganization 
B. Shift of function 
C. Moving part from one whole to another. 


Table I presents minimum descriptions of the tests. Tests 1-34 were new 
ones designed for this study, while tests 35-53 (except for 41 and 53) were 
included as reference tests to determine previously known factors. The hypoth- 
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TABLE 1 


SUMMARY OF TEST REQUIREMENTS AND CORRESPONDING HYPOTHESES 











Test + .Task Required for Item hfc* *1 
1, Sentence Analysis List all facts or assumptions contained VIB -66 
in simple sentences. 

2. Paragraph Analysis Analyze paragraph into five basic ideas. VIB 62 
3. Figure Analysis Pick out objects jumbled together in VIA -72 
drawing with lines in common. 

4. Figure Concepts Test Find features in common in pictures of IVA 

(uncommonness) objects; scored for uncommon responses. 

5. Impossibilities Test List things that are impossible. 11D 

6. Plot Titles (low quality) Write titles for story plots. Score is IID 84 
the number of low-quality titles written. 

7. Plot Titles (cleverness) Score is the number of clever titles Ivc 57 
written. 

8. Common Situations List problems suggested by everyday IA -90 


Test situations. 


9. Brick Uses (fluency) List different uses for a brick. Score 1B 
‘ is number listed. 
10, Brick Uses (flexibility core is the number of times the class Ill C 
of uses is changed. 
11. Number Associations List associations for given numbers. IVA 56 


Score is the number of statistically 
uncommon responses. 


(uncommonness) 


12. Consequences Test List consequences of certain changes. UB 68 
(low quality) Score is the number of more obvious 
consequences. 
13. Consequences Test Score is the number of indirect or VA 66 
(remoteness) remote consequences listed. 
14. Circle Square I Manipulate round and square objects ILA -97 
according to a rule. 
15. Circle Square Il Manipulate round and square objects. IIA -83 
Rule changes from item to item. 
16. Match Problems Take away matches and leave certain 1B 38 
number of squares or triangles. 
17. Sign Changes Substitute one arithmetic operation for IIA -89 
another and solve simiple equations. 
18. Implied Uses Give several secondary meanings of lB .67 
words. 
19. Quick Response Word associations scored for IVA 81 
(uncommonness) uncommonness. 
20. Associations I Associate single word with given pair; IVB -87 
completion response. 
21. Associations Il Same as I, but word must have double IVB -62 
meaning. Multiple choice of first letter 
of correct word. 
22. Unusual Uses List different uses for common objects. IV B -80 
23. F-Test E sets his own problem in doing each Ic 81 
item, all different in kind. 
24. Apparatus Test Suggest two improvements on each of 1A By 
several appliances. 
25. Social Institutions Suggest two improvements on each of IA ts 


(direct implications) 


several institutions; scored for number 
of direct improvements. 








* hfc: hypothesized factor content (see preceding outline for key to the hypotheses). 


+ We are indebted to Professor L. L. Thurstone for his permission to reproduce tests 35, 
36, 37, 39, 40 and 43. Sentence Gestalt I (test 27, 53) is patterned after a test developed 


by H. M. Grayson. We wish to thank Constance D. Lovell for making available to us a 


number of classes in beginning psychology for preliminary experimental testing. 
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TABLE I (Continued) 














Test Task Required for Item hfc 711 
26. Social Institutions Same as 25, scored for number of VA 46 
(indirect implications) indirect or remote improvements. 
27. Sentence Gestalt I Separate words run together in VIILA 56 
(errors of omission) continuous discourse. 
28. Word Transformation Regroup letters in series of words, Vu C -92 
without changing order, to form new set. 
29. Gestalt Transformation Indicate object having part which will VIII B ot 
serve a specified purpose. 
30. Picture Gestalt Indicate object in photograph or part VUI B 41 
which will serve a specified purpose. 
31. Object Synthesis Combine two objects to make a new VIB 72 
object. 
32. Concept Synthesis Combine two ideas to suggest a new VII B 61 
idea. 
33. Sentence Synthesis Make sentence out of words in vuc 87 
scrambled order. 
34. Word Matrices Complete matrix of words following VII C -87 
a pattern. : 
35. Punched Holes ++ Indicate pattern of holes in unfolded Visual- 85 
paper which was punched while folded. ization 
36. Mutilated Words Test ++ Identify word composed of partial VILA, -80 
letters. closure I 
37. Street Gestalt Identify objects with parts missing. VILA, 62 
Completion Test ++ closure I 
38. Perceptual Speed ++ Which object is the same as the Perceptual 92 
one given. speed 
39. Controlled Associations ++ Write several synonyms for each Association- .82 
given word. al fluency 
40. Disarranged Words ++ Make word out of scrambled letters. Word fluency .64 
41. Unusual Details Indicate anomalous features of IB .66 
pictures. 
42. Penetration of Locate faces hidden in pictures. VIA, 83 
Camouflage ++ closure II 
43. Vocabulary ++ Indicate meaning of word presented Verbalcom-_ .78 
in brief context. prehension 
44. Ship Destination ++ Find best port for ship, considering General -80 
several variables. reasoning 
45. Symbol Manipulation ++ Mark symbolically presented ‘‘ If. ., Symbolic 3 
then’’ statements true or false. thinking 
46. Inference Test ++ Select correct conclusion from Logical A 
given statement. reasoning 
47. Spatial Orientation Locate given section on aerial Perceptual 63 
Test (Part I) ++ photograph. speed 
48. Practical Judgment ++ Find best solution to practical Judgment 43 
problem. 
49. Numerical Operations Simple addition and multiplication. Numerical .64 
Test (Part I) ++ facility 
50. Numerical Operations Simple subtraction and division. Numerical 79 
Test (Part Il) ++ facility 
51. Mechanical Principles ++ Apply simple mechanical principles Visualization .82 
to solve problems. and general 
reasoning 
52. Arithmetic Reasoning ++ Solve arithmetic-reasoning problems. Numerical 79 
facility and 
general 
reasoning 
53. Sentence Gestalt I Separate words run together in con- VILLA -84 


(number right) 


tinuous discourse. 





++ Tests included as reference tests because of the known factor content. 
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esis and its particular variation for which each test was constructed or the 
factor it was expected to identify are indicated. An estimate of reliability of 
each test score is given in the last column. Many of the reliability coefficients 
are lower than one would like to see in tests used in factor analysis, but they 
are tolerated for two reasons: (1) the tests generally had to be short in order 
to include a large battery in a limited testing time; and (2) it is likely that some 
of these tests, by their very natures, cannot be made highly reliable even when 
made intolerably long. 


Test Administration, Scoring, and Factor Analysis 


The experimental battery was administered by experienced test-adminis- 
tration teams of the Human Resources Research Center, Lackland Air Force 
Base, Texas, to groups of air cadets and student officers. Comparison of the 
sub-samples in terms of age, education, and distributions of scores on tests 
indicated that they could justifiably be combined for a single factor analysis. 

Most of the experimental tests required completion-type answers. It was 
felt that, in general, in order to measure creative abilities most effectively, it 
was important that something be produced by the examinee. Efforts were 
made to derive more than one score from a test where possible and to assure 
that such scores had sufficiently unique variances to justify their use in the 
same analysis. Further information concerning these special scores, particu- 
larly those used in measuring originality, will be found in (17). 

The coefficients of correlation were Pearson product-moment r’s except 
those involving variable 27. Because the obtained distribution for this variable 
was truncated it was dichotomized near the median, and biserial coefficients 
were computed.* 

The factor extractions were done on IBM equipment using Tucker’s 
adaptation of Hotelling’s iterative procedure for determining principal com- 
ponents (15). The highest coefficient in each column of the correlation matrix 
was used as the estimate of communality of the corresponding variable. Six- 
teen factors were extracted. 

Zimmerman’s graphic orthogonal method was used in rotating the axes 
(18). The basic considerations guiding the rotations were Thurstone’s criteria 
of simple structure and positive manifold, the known factor content of the 
reference tests, and psychological meaningfulness. 


Interpretation of the Factors 


The factors are presented in the approximate order of their familiarity 
and definiteness. The interpretations rest principally upon those tests with 
loadings of .30 or greater. The test numbers, names, and factor loadings are 


*The correlation matrix, the centroid factor matrix, and the rotated factor matrix 
have been deposited with the American Documentation Institute, 1719 N. Street, N.W., 
Washington 6, D. C. Order Document No. 4156 remitting $1.25 for 35 mm. microfilm or 
$1.25 for 6 by 8 inch photocopies. 
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listed at the beginning of the discussion of each factor for those tests that define 
the factor. The well-known reference factors are listed without comment. A 
loading marked with an asterisk is the highest one for the test in question. 


27 


46 


15 


11 


50 


49 


52 
47 


34 


51 


37 


42 
36 


This factor seems to represent a combination of Thurstone’s two gestalt 
factors, closure J, speed and strength of closure, and closure IT, flexibility of 
closure. While the loadings on this factor are not high, the only tests with 
significant loadings are the three tests that were included as references for the 
two closure factors. 
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Factor A Verbal Comprehension (V) 


Vocabulary 

Sentence Gestalt I lever score, . ellen’) 
Sentence Synthesis . as 
Inference Test . . 

Sentence Gestalt I (right wee 

Circle Square II ae 

Concept Synthesis . : 

Number Associations Sanocunmnenneen) 


Factor B Numerical Facility (N) 


Numerical Operations Test (Part II) 
Numerical Operations Test (Part I) . 
Arithmetic Reasoning . 

Spatial Orientation Test (Part 1) 


Factor C Perceptual Speed (P) 
Perceptual Speed 
Spatial Orientation Test (Part I) 
Penetration of Camouflage 
Street Gestalt Completion Test 


Factor D Visualization (Vz) 
Mechanical Principles . 
Punched Holes . 
Arithmetic Reasoning . 
Practical Judgment 


Factor E General Reasoning (GR) 
Ship Destimation 
Word Matrices . ; 
Arithmetic Reasoning . 
Mechanical Principles . 


Factor F Closure (C,;) and (C2) 
Street Gestalt Completion Test 


Penetration of Camouflage 
Mutilated Words 





.65* 
.53* 
.53* 
.47* 
43 
36 
.30* 
31 


ae” 
.72* 
.49* 
37 


.56* 
a7” 
.45* 


.54* 
.45* 


.32* 


.42* 
.38* 


33 


.44* 


.40 
35 
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Factor G Word Fluency (W) 


53 Sentence Gestalt (right score) . ...... =.=. =. =«.56* 
ye Te I we ce ee se eP 
Dy OPPO OG gw a we wa we 
BO Witarranwea Words 2 kk ke Ul le Ul ll CU CU 
BO Mueintee Words 5 5. Sk Ue a ee ee UL CUE 
Pt Garg mawere en. Go cc ales alma Be SS Se GE 


This factor seems to involve the ability to produce words fulfilling certain 
structural requirements. In contrast to the next factor, associational fluency, 
there is little or no emphasis here on producing words to meet specific require- 
ments of meaning. Four of the six tests involve the manipulation of letters or 
words meeting specific physical restrictions, i.e., they must involve certain 
letters and be real words. The loadings of the tests, Circle Square I and Circle 
Square II, are less easily accounted for, Lut are somewhat consistent with this 
interpretation since the tasks they impose do not emphasize meanings but the 
manipulation of circles and squares identified with certain objects. 

This factor is identified with Thurstone’s factor W since the principal 
test loadings in his analysis were for similar tests. Zimmerman (19), in a 
rerotation of Thurstone’s Primary Mental Abilities Battery, renamed this 
factor letter fluency. In the rerotations, Thurstone’s unidentified factor 10 
emerged as a factor similar to our factor H, associational fluency, which Zim- 
merman called verbal fluency. Fruchter (4), in a factor-analytic study of the 
nature of verbal fluency, found a similar pair of factors which he called word 
fluency and speed of calling up pertinent associations. An examination of these 
two factors also reveals a division between tests involving word structures 
and those involving word meanings. 


Factor H Associational Fluency (AF) 


30 Controlled Agsociationg . . ..... . . «6. + A46* 
27 Sentence Gestalt (error score, reflected). . . . . . . . .84 
11 Number Association (uncommonness) . . . . . . . .. .33* 
Se Word tranmronmauon . 6 Go i os se Ue lw we we wl RR 
AQ Wisarvramwea Words. 6 5 ck tt 


The tests in this list, in contrast with those for the previous factor, all 
seem to require the production of words that meet specific requirements of 
meaning. Controlled Associations requires the examinee to write a number of 
words that are similar in meaning to a given word. Number Associations 
(uncommonness) was scored for the number of relevant but relatively unusual 
words or associations with which the examinees responded to a stimulus 
number. In the Sentence Gestalt test the subject is presented with a simple 
passage of continuous discourse in which the words are all run together. The 
examinee’s task is to draw lines separating the words. In accomplishing this, 
the individual who concentrates on meaning tends to make fewer errors of 
omission, but does not necessarily make a higher rights score. The individual 
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who separates the words rather on the basis of their superficial appearance 
tends to make a high rights score. The Word Transformation and Disarranged 
Words tests seem to involve both of these abilities although their major load- 
ings are on the word-fluency factor. The associational-fluency factor may be 
distinguished from the verbal-comprehension factor in that Mbemphasizes the 
production of words rather than the recognition of words. 

Factor I Ideational Fluency (IF) 


6 Pinot tutes Gowmauanty) . . . 2. ss sw ew hw hl) «(DF 

S “common tiantions . . 5 8 kk Oe Ul lO 
Consequences Test (low quality). . . . . . .. . . + «.55* 

Cs ole USS Co cr a eS 2S 
PNNEEES oS Sam GR F 

i Deenbenee mONIyHS 2. a. 6s we oS eS ww aw se BE 


The tests with significant loadings on this factor all present a task in 
which the examinee is asked to call up as many ideas as possible in the time 
allowed. The score for all tests except Plot Titles and the Consequences Test 
was the total number of relevant responses given. In the case of Plot Titles 
and the Consequences Test two scores were derived from each test. The high- 
quality responses were scored separately from the low-quality responses. The 
Plot Titles test requires the examinee to list as many appropriate titles as 
possible for two simple story plots. The responses were then rated for clever- 
ness. The two scores derived were: (1) the number of clever titles; and (2) the 
number of low-quality or non-clever titles. The latter score is the one repre- 
sented on this factor. Two scores were also derived from the Consequences 
Test: (1) the number of remote or far-reaching consequences listed in response 
to certain hypothetical changes in our world; (2) the number of immediate or 
less remote consequences listed. The score represented on ideational fluency is 
the number of immediate consequences listed. The Brick Uses and Common 
Situations tests are quite similar to ‘‘Topics” and “Things Round” which 
have been categorized by French (3) as tests of ideational fluency. The ability 
involved here seems to be the speed of calling up ideas in a situation in which 
there is relatively little restriction, and quality does not matter. 


Factor J Originality (O) 


7 Plot Titles (cleverness) Oe ee, a en Mer Oo 
19 Quick Responses (uncommonness) . . . . . . . . . .49*.4 
13 Consequences Test (remote consequences) . . . . . . . + .42* 

4 Figure Concepts (uncommonness) . . ... . . . . .382* 

ea ee | | 
Pome GG a we es OA OS RG 

S Seemmonsmmataons 6. 6 6 le kl ew we 
89: Controlled Associations . . . . . » © «© » s « « 380 
a RMMEMINIOE oS we a Se a ey 
37 Street Gestalt Completion Test Se oe Care ee Oe es 
30 Picture Gestalt (empirically keyed) . . . . . . . . —.14 
BD Mumehed Mules. . 2. 2. we eel lw kl we | EB 


20 Gestalt Transiormation ......+5:5.6+4ii.s — .25 
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This is the hypothesized originality factor. It appears to be bipolar, as 
shown by several small negative loadings, with the ability to produce uncom- 
mon, clever, or remote responses on the positive side. [Hargreaves (10) has 
found an originality factor in some of these kinds of tests.] The negatively 
related tests are all tests keyed with a tendency for the “right”? response to 
have been selected on an arbitrary, conventional basis. The individual who 
engages in an uncommon line of thought is likely to be penalized for his 
unconventional thinking on an arbitrarily keyed test. 

Of the seven tests constructed to measure hypothesized aspects of this 
factor, five of them, Plot Titles, Quick Response, Figure Concepts, Unusual 
Uses, and Associations I have significant loadings. The loading of the ‘‘remote 
consequences” score of the Consequences test can be readily understood. It 
was hypothesized that this test would emerge on a factor called penetration, 
defined as the ability to see beyond the obvious and immediate to the remote 
of far-reaching consequences of a situation. A penetration factor did not come 
out in this analysis, but its definition is essentially the same as subhypothesis 
IVB for originality: remote, unusual, or unconventional associations. 

The Impossibilities test was designed as a fluency test, which it is, pri- 
marily, but it presents an unusual task. In order to respond at all the individ- 
ual must think along uncommon lines. In the Common Situations test it may 
be that the examinee himself censors some of the more trivial responses and 
favors uncommon ideas. In giving synonyms in the Controlled Associations 
test, the individual very rapidly exhausts the more obvious responses and 
must search for more far-fetched associations. 


Factor K Adaptive Flexibility (AX) 


Pe, Oe | ge 

ll, SI eee SO a Se ae ee 
Py eH WS SS me a —.it 
SO SaQoeemOInEs) a eS BR ea ee 
26 Social Institutions (indirect implications) . . . . . . —.16 
{10 Brick Uses (flexibility) . . ....... +. =. —.17 
Bl Whyeopeynueess 6 kk lw — .20 
ee yee er ay ee a ee cas’ fan YS — .22 
ee cmmenueemee! 6) ee ee ee ee — .24 


The two tests with significant positive loadings on this factor seem to 
require the ability to change set to meet new requirements imposed by chang- 
ing problems. In both tests the examinee must change his set in order to solve 
successfully the problems presented. 

The negative side of the factor is defined by open-end tests whose prob- 
lems impose relatively little restriction on the responses. It is commonly 
» thought that the opposite of flexibility of thinking is rigidity. For example, 
Oliver and Ferguson have reported a factor of rigidity (or rather, lack of it) in 
tests similar to those leading in this list (11). The present study did not assume 
that rigidity is the direct opposite of flexibility and thus did not seek to include 
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any tests emphasizing rigidity in a positive manner. An examination of the 
tests with negative loadings on the list above does not at all suggest rigidity 
as a probable cause for making good scores on them. The nature of this pole 
opposite to adaptive flexibility is therefore an open question. Further work 
should be directed toward determining whether the bipolarity can be verified 
and, if so, what kind of test it takes to have a strong negative loading. 


Factor L Spontaneous Flexibility (SX) 


10 Brick Uses (flexibility) .43* 
22 Unusual Uses adi east ae ee ae .39* 
13 Consequences Test (remote consequences) . . . . . . . .do* 
8 Common Situations ae ee .33 
18 Implied Uses .30* 


This factor seems to represent the ability to produce a diversity of ideas. 
While it requires the individual to change set, it differs from adaptive flexibility, 
in that the direction of set change is not restricted. The individual is not 
required to arrive at one particular answer in order to be successful. All that 
is required is that he change set in some direction and the more often the 
better. It might be characterized as lability of ideas. 


Factor M Redefinition (Re) 


29 Gestalt Transformation BY faa 

48 Practical Judgment .31 

31 Object Synthesis Ks * Ng 
.30* 


30 Picture Gestalt . 


The interpretation of this factor is not completely clear. Two of the four 
tests with significant loadings, Gestalt Transformation and Picture Gestalt, 
were constructed for the purpose of measuring “redefinition,” the ability to 
shift the function of an object or part of an object and use it in a new way. 
The test Object Synthesis seems to require this same ability, since to make a 
new object out of two given objects the functions of the objects must be 
shifted. The USAF Practical Judgment test is somewhat consistent with this 
in that some of the items present problem situations in which certain materials 
are available for the solution and in improvising the solution the examinee 
may be required to utilize the objects in some way other than their most 
common use. This interpretation does not preclucs the possibility that this 
factor may be the one called judgment in other ana! ses (6). 


Factor N Sensitivity to Problems (SP) 


25 Social Institutions (direct implications). . . . . . . . .70* 
24 Apparatus Test . aL te .59* 


The two tests with significant loadings on this factor were constructed to 
measure a subhypothesis of sensitivity to problems, IA, seeing defects, 
needs, and deficiencies. They seem to require the ability to recognize practical 
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problems. Since the task required in both of these tests is quite similar, further 
investigation will be necessary to explore the generality of this factor. It is 
even possible that this factor is specific (see the discussion below). 


Factor O Doublet 


26 Social Institutions (indirect implications) . . . . . . . .45* 
ee PRERERININEORD SG ee wc oe oe GS Ek se 


No interpretation of the nature of this factor will be made. Social Insti- 
tutions (indirect implications) was scored to measure the hypothesized pene- 
tration factor, but no such factor emerged. It is not likely that this is a trace of 
a penetration factor because of the absence of other scores, such as the remote- 
consequences score on the Consequences Test, from this list. 

The Apparatus Test was not scored at two levels of directness because this 
did not seem feasible. Most of the problems seen with respect to machines 
seemed to be of the more direct type. It may be, however, that the Apparatus 
score contained problem-seeing variance at two levels of directness which go 
along with the separation made in the two level scores on Social Institutions. 
It does not seem likely that there are two seeing-problems factors of this sort. 
It is more likely that one of these factors is specific to this kind of test, al- 
though which one is specific would be hard to say. One or more problem-seeing 
tests of somewhai uiferent character than either of these are needed in order 
to help decide which 13 the common factor. 


Factor P Residual 


(There were no loadings on this factor greater than .30 in absolute 
value.) 


Discussion 


Of the 14 identifiable factors, five were the well-known factors expected as 
by-products: verbal comprehension, numerical facility, perceptual speed, visual- 
ization, and general reasoning. The bulk of the discussion will be devoted to a 
brief review of the hypothesized abilities in the light of the nine obtained 
factors related to them. 

The first ability hypothesized was a sensitivity to problems. Factor N is 
defined by two of the test scores designed to measure the ability to see defects, 
needs, or deficiencies. Briefly, this factor seems to involve awareness of needs 
for changes and of defects or deficiencies in everyday situations. Two other 
subhypotheses concerning the nature of this factor were not sustained. One of 
these emphasized seeing the odd or unusual and the other emphasized seeing 
what needs to be done. The ability to see problems thus seems to be much 
more restricted than was expected. 

From previous factor analyses involving fluency tests, particularly those 
of Taylor (13), Carroll (2), Fruchter (4), and Zimmerman (9), at least three 
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fluency factors were expected. Three factors were found, which verifies previ- 
ous findings. 

In hypothesizing about fluency factors it was noted that the tests vary in 
two respects: (1) the restrictions under which responses are given could be 
simple or complex, and (2) the potential number of responses could be large 
or small. It was expected that the fluency factors might separate somewhat 
along these lines. 

The obtained fluency factors did correspond roughly to some of these 
variations, but another meaningful principle of differences among them, not 
anticipated, is that they involve different levels of meaning. Factor G, word 
fluency, corresponds to subhypothesis IFC. (See outline of hypotheses above.) 
It seems to involve the speed of producing words that fulfill restrictive, 
structural requirements. By “structural’’ is meant that the letter combinations 
given are real words. Thus they satisfy minimum requirements as words. 
Meaning is of little consequence; familiarity with the perceptual pattern 
would apparently be sufficient. Factor H, associational fluency, corresponds to 
subhypothesis IIA. The tests loaded with this factor require the 7 produc- 
tion of words meeting specific requirements of meaning. 

The tests designed to test hypotheses IIB and IID emerged on a single 
factor, factor J, which has been called ideational fluency. Both hypotheses 
prescribe a large potential, i.e., the number of possible responses is very large. 
It does not seem to matter whether the restrictions are simple or complex. 
The responses are more than single words. They need not be, and probably 
are not usually, verbatim-recall responses. The ideas themselves may be new 
to the examinee. 

In the hypotheses concerning flexibility of thinking, three kinds or occa- 
sions for such an ability were distinguished. Two factors that emerged seem 
close to two of these conceptions. The first of these has to do with changing 
one’s mental set to meet new requirements imposed by changing problems 
(factor K). It has been named adaptive flexibility. The solution of the items 
requires frequent and radical changes in principles. Several small negative 
projections appear on this axis, indicating that this factor is probably bipolar. 
Having a large amount of this ability is apparently a handicap in getting a 
good score in some tests. The tests with negative loadings, though the loadings 
are small, have in common the fact that they impose little restriction on the 
examinee. We should expect some kind of rigidity or persistence or persevera- 
tion as the quality opposite to flexibility, but none of these traits is at all 
obvious as a common property of the tests with negative loadings. The most 
obvious contrast between tests at the opposite poles of this factor is a neces- 
sary control and channeling of thinking toward one right answer at the positive 
end and freedom to think in many directions in tests at the negative end. At the 
positive end, too, there is a necessary breaking away from habitual sets, a 
feature that is merely absent at the other end. 
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The second flexibility factor (factor L), called spontaneous flexibility, 
allows for a free change of set not directed toward the solution of a narrowly 
defined problem. All that is required to make a good score is that the examinee 
change the course of his ideas. He quickly and easily takes new directions 
with or without apparent good cause. This kind of flexibility might be charac- 
terized as lability of ideas. It is apparently not bipolar, but this possibility 
should be explored. 

In attempts to measure originality, tests were constructed to check on 
three alternative principles or approaches to measurement in this area: 
(1) uncommonness of responses as measured by weighting the responses of an 
individual according to the statistical infrequency of those responses in the 
group as a whole; (2) the production of remote, unusual, or unconventional 
associations in specially prepared association tests; and (3) cleverness of 
responses, as evaluated by ratings of degrees of cleverness of titles suggested 
for short story plots. . 

Tests that were designed for all three subhypotheses tend to have in 
common a single factor, factor J, which can be justifiably called originality. 
Inasmuch as tests representing all three methods of measuring originality have 
loadings on this factor, we may have some confidence in its generality. 

It should be noted that this factor has some appearance of bipolarity 
since there were a few small negative loadings of other tests in the battery on 
this factor. Those tests with negative loadings were tests of the kind whose 
“right’’ responses tend to be keyed on an arbitrary or conventional basis by 
the test constructor. The individual who engages in an unusual line of thought 
is likely to be penalized for his originality in such tests. In this connection, the 
essentially zero loading for originality in Associations II (as contrasted with 
the significant loadings in Associations I) is worth mentioning. In this test, 
also, one “‘correct”’ answer is given credit. It may be that the original exami- 
nees think of other appropriate responses whose initial letters appear among the 
alternatives, and for which they receive no credit. 

The fifth hypothesis concerns a separate factor of penetration or the 
ability to see remote consequences in space, in time, or in a causal chain of 
circumstances. No such factor emerged. The remote-consequences score of the 
Consequences Test came out with its highest loading on the originality factor. 
Evidently, the remoteness of ideas represented by this test score is not different 
from the remoteness of ideas required by the test scores hypothesized for 
originality. The indirect implications score of the Social Institutions test 
emerged on a doublet which was not interpreted. 

Hypothesis VI, which forecast a general analyzing ability, was represented 
in the investigation by tests involving the perceptual analysis of perceived 
objects and the conceptual analysis of both pictorial and verbal material. This 
hypothesis was not verified. The variances of analysis tests were distributed 
among various factors, including creative factors. 


WV ¥ 
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The aspect of hypothesis VII, the synthesis hypothesis, that had to do 
with production of perceived objects, was substantiated by factor F, closure. 
Thurstone’s closure I factor was expected in this area. We were also testing 
whether this same factor would be so general as to appear in tests involving 
conceptual organizing or whether a separate synthesizing factor would be 
found in the realm of thinking. The only synthesis factor that we found seems 
actually to be a combination of Thurstone’s closure I (speed and strength of 
closure) and closure II (flexibility of closure), and to be confined to perceptual 
tests. There evidently were not enough tests for closure IJ in the battery to 
effect a separation. 

We believe that our selection of tests gave good opportunity for both 
analysis and synthesis factors to emerge in the domain of thinking. Since our 
findings are negative at these points, we are led to question the existence of 
psychological unities known as conceptual analysis and conceptual synthesis. 
If they do exist, it will probably require other types of tests to discover them. 

The last hypothesis concerned the redefinition or reconstruction of some- 
thing that exists. Factor M seems to fit this hypothesis, although it is better 
deseribed as a redefinition ability than a reconstruction ability. It appears in 
tests that require common objects or parts of objects to be utilized in new and 
unusual ways. The presence of the test Practical Judgment with this factor 
was a surprise but this finding can be rationalized. This is the USAF Practical 
Judgment, test that has, in previous analyses, defined a unique factor called 
judgment. It may be that the present redefinition factor and the judgment 
factor are one and the same. 
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An attempt is made to elaborate upon Lev’s results concerning the relia- 
bility of the point biserial coefficient of correlation in a manner that will be 
helpful to the psychological statistician. Procedures required in the use of the 
non-central ¢ tables prepared by Johnson and Welch are described as they 
relate to the determination of the fiducial limits for a point biserial coefficient. 
A normal approximation technique for the estimation of fiducial limits is also 
suggested. Numerical evidence is presented which shows that relative to a 
given level of significance the width of the fiducial interval estimated from a 
point biserial coefficient of any size is smaller than that of the fiducial interval 
corresponding to an ordinary Pearsonian coefficient of the same magnitude. 


In psychological and educational statistics the product-moment coeffi- 
cient of correlation between a continuous variable y and a truly dichotomous 
variate x which assumes the values of 1 and 0 only is known as the point 
biserial coefficient of correlation. Probably the most widespread use of this 
coefficient occurs in item-analysis procedures in which a test question scored 
as 1 or 0 is correlated with a continuous variable such as the total score on a 
test or a measure on an external criterion. In the application of the point 
biserial coefficient of correlation to various areas of psychological research 
knowledge of the sampling behavior of this coefficient would obviously be 
desirable. 

Although apparently little has been written in the literature of psycho- 
logical statistics concerning the reliability of the point biserial coefficient, 
Lev (7) has presented a solution to the problem in the language of mathemati- 
cal statistics through relating this coefficient to the non-central ¢ distribution. 
Lev has suggested the employment of tables of the non-central ¢ distribution 
prepared by Johnson and Welch (6) for determination of fiducial limits for 
the point biserial coefficient. However, it would appear that Lev’s presenta- 
tion could not only be restated but also expanded to a point that would permit 
psychologists to make greater use of his important findings. 
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Problem. It is the purpose of the writers (1) to express Lev’s solution 
in a manner that would appear to be more meaningful and useful to psycho- 
logical statisticians than it is in its current form, (2) to illustrate the use of 
Table IV prepared by Johnson and Welch—especially as it relates to the 
determination of the fiducial limits of a point biserial coefficient, and (3) to 
suggest a normal approximation technique developed from a transformation 
proposed by Johnson and Welch that furnishes a relatively satisfactory means 
for estimating the fiducial limits for a point biserial coefficient. In addition 
preliminary numerical evidence is presented that points to the fact that cor- 
responding to values of the point biserial coefficient the fiducial interval at a 
specified probability level is narrower than it is for the ordinary Pearsonian 
coefficient of correlation, the sampling behavior of which is assumed to be 
consistent with the assumptions underlying the model defined by the normal 
bivariate surface. 


Definitions of symbols. Let 


dichotomous variable assigned values of unity or zero; 


4 = 
y = continuous variable with which z is correlated; 
Yp = values in y associated with x = 1; 
Yo = values in y associated with z = 0; 
N, = number of measures paired with z = 1; 
N, = number of measures paired with z = 0; 
N =WN,+N,, the total number of observations in a sample; 
= number of degrees of freedom in a given sample; 
= proportion of y values in the population paired with z = 1; 
= Q; 


= proportion of sample y values paired with z = 1; 


n 
p 

G = proportion of y values in the population paired with x 
p 

q = proportion of sample y values paired with z = 0; 


Hp = mean of the y, values in the population; 

Mg = mean of the y, values in the population; 

Jp = mean of the y, values in the sample; 

Gg = mean of the y, values in the sample; 

cy = population standard deviation of the y variable; 

8, = sample standard deviation of the y variable; 

p = population value of an ordinary Pearsonian coefficient; 


sample value of an ordinary Pearsonian coefficient; 

ppbi = (up — ue) VW pg/ey , the point biserial coefficient in the population; 

Toi = (Jp — Gq) V pq/sy , the point biserial coefficient in a sample; 

z = a variable normally distributed about zero with unit variance; 

= a variable distributed independently of z as x?/n; 

= z//w, Student’s variable; 

= (z + 5)/+/w, the non-central ¢ statistic in which 6 is a constant representing the 
degree of ‘“‘non-centralness’”’ and related to the power of Student’s t-test; 

€ = the probability that a given value in #7, say | %&|, will be exceeded by | #|—the 
probability level selected for determination of upper and lower fiducial limits 
(e.g., € = .025 and 1 — e = .975 when 5 per cent fiducial limits are sought); 

= the standard error of a point biserial coefficient of correlation. 
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Meaning of non-central t. Perhaps the most direct way to gain an intui- 
tive impression of the concept of non-central ¢ (as compared with that of ¢) is 
in terms of an illustrative example concerning the significance of a sample 
mean. An investigator wishes to test the hypothesis H, that the value of the 
mean p of a normal population is uo . Customarily one employs Student’s ¢-test, 
which would necessitate the calculation from experimental data of the familiar 
quantity 


VN (G — bo) 


t= r 


in which @ is the sample mean, N is the size of sample, and ¢ is an estimate of 
population standard deviation o based upon the square root of the sum of 
squares of deviation measures about the sample mean divided by N — 1. The 
value obtained for ¢ is compared with one in the model furnished by the 
t-distribution at N — 1 degrees of freedom relative to the level of significance 
e chosen. 

If the hypothesis H, is thought to be false and if an alternative value py, 
is assumed for u, one may wish to know what the power of Student’s test may 
be; i.e., what the probability will be of rejecting the (false) hypothesis H, . The 
expression VN (9 — mo)/é for ¢ may be rewritten in a different (but alge- 
braically equivalent) form and set equal to Z as follows: 

p= [YEG od» VN =o], 2 


o 





z terms of definitions previously given it can be seen that z = N( (9 — u,)/e, 
= VN (41 — po)/o, and ¢/ao = Vw. The power of Student’s ¢-test is a 
a of the difference between py; and yo and hence of 6. 


I. Sampling Error in r,,; When p,».; = 0 


In demonstrating the relationship of r,,; to the non-central ¢-distribution 
Lev has found that 


= N— ee 0s cs | cee 1 
i= v Pg. 0 < |p, ) (1) 


which may be rewritten in a form such that 


b= VN et . 2 
vi- Pov ” 


When p,,; is equal to zero, it is evident in (2) that 6 will vanish. Since the 
distribution of 7 is the same as that for ¢ when 6 assumes a value of zero, it is 
apparent that the distribution of sampling errors given by (1) is the same as 
that for Student’s ¢ provided that p,,,; is equal to zero. In as much as Fisher (4) 


lA 
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has demonstrated that when p = 0 the sampling error of r may be related to 
Student’s variable by the expression 


— , ¥ 
t= VN-2 ans (p=0), (3) 
it is clear that the sampling distributions of r,,; and r are equivalent when the 
degree of correlation in the population (p or p,,;) is zero. Therefore entries 
within the familiar tables prepared by Snedecor and Wallace and reproduced 
in Guilford (5), Edwards (3), and other elementary texts may be employed to 
ascertain the value of r,,; required for significance at the 5 per cent and 1 per 
cent levels relative to a given number of degrees of freedom. 
In the instance of large samples the hypothesis that p,,; is equal to zero 
can be tested through use of tables of the normal probability function and of 
the standard error approximation 





q 1 
$+ bi = > « ij 
VN -1 


For small samples one may also make use of the normal approximation through 
employing Fisher’s familiar z-transformation. 


(4) 


1+r (5) 


2’ = log gk 


in which r,,; would be substituted for r, and for which the standard error 
would be estimated by 


7, EF OG * (6) 


II. Sampling Error in r,,; When p,»; # 0. 


Although equations (1) and (3) appear to be of the same form, it should 
be emphasized that they are equivalent only if p = p,,; = 0. An inspection of 
the equation of the distribution function for r developed by Fisher (4) and of 
the extremely complicated expression for the sampling distribution of r,,; in 
terms of /, the distribution function of which involves an inexact integral as 
well as a gamma function, points to the fact that the lack of identity of the 
two frequency distributions is obvious. However, it would be difficult to 
establish analytically a basis for the determination of the precise amount and 
direction of the difference in fiducial limits for numerically equivalent values 
in r and r,,; relative to a given probability level ¢ and a specified sample size. 
Therefore through use of numerical methods that depend upon the employ- 
ment of tabular procedures devised by Johnson and Welch (6) the apparent 
differences in the amount and direction of sampling errors in r and r,,,; will be 
demonstrated in terms of the fiducial limits in the corresponding parameters. 
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If a fixed probability level ¢ is assumed it will be shown for a representative 
set of values in r,,; and r relative to different numbers of degrees of freedom 
that the width of the fiducial intervals derived from values of r,,; is less than 
that for corresponding intervals based on numerically equal values in r irre- 
spective of the amount of correlation in the population. 


An explanation of the use of Johnson and Welch’s Table for determination 
of fiducial limits. To determine the fiducial limits for a parameter, the corre- 
sponding statistic of which can be related to the non-central ¢-distribution, 
one must ascertain the value of 4, say 5, that is associated with a desired prob- 
ability level « (or 1 — e¢) relative to a given n and to a magnitude , in Z which 
is determined from the obtained value in the statistic. Such a value 6 may be 
denoted as 


5o ans a(n, io ’ €). (7) 


Once 6, is calculated, it can be set equal to the general expression for 6 within 
which the parameter is present; for example, the right-hand member of (2) in 
the instance of the point biserial coefficient. Subsequent to substitution of the 
appropriate number for N within the general expression for 5 and of the 
derived constant 6) to which this general expression has been set equal, one 
may solve for the parameter, the value for which will constitute either an 
upper or lower fiducial limit dependinz upon the value chosen for e. 

The exact determination of 6, depends upon use of appropriate entries in 
one of the several tables making up Table IV in the article of Johnson and 
Welch (6). The required values in 6 are given exactly by the expression, or 
transformation, 


2\} 
in, ,2) =~ —An,L a(t oa | ; (8) 


in which é, is determined from values known for the statistic and N, and in 
which \(n, % , €) is derived from the Johnson and Welch’s Table IV. The 
equation for (8) may for the sake of simplicity be abbreviated as 


5o _ dy = Nod, (8’) 


in which 6) = 8(n, & , €), Xo = A(n, by, ©), and @ = [(1 + &)/2n]}. 

In the instance of large samples the expressions given by (8) and (8’) may 
be interpreted within the setting of the normal probability scale in that for 
various levels in ¢ the quantity 6, is distributed approximately normally about 
a mean /, with a standard deviation ¢. Multiplication of ¢ by \» , which would 
correspond to a normal deviate, yields a deviation score that defines the extent 
of departure of 5) from the mean é, . In Section III a normal approximation to 
(8’) will be considered. 

The tables appearing in Johnson and Welch’s article have been con- 
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structed in a manner that permits the determination of a maximum amount of 
information in a relatively minimum amount of space. Across the top of each 
table are two rows consisting of the number of degrees of freedom with which 
entries in each column are associated. In the upper of the two rows across the 
top of the table are the degrees of freedom n = 4, 5, 6, 7, 8, 9, 16, 36, 144, and 
co. In the second row the degrees of freedom are associated with 12/~/n in 
such a manner that corresponding to values of n = 9, 16, 36, 144, and ~, the 
values for 12/ Vn are 4, 3, 2, 1, and 0 respectively. Since the function 12/ Vn 
is thus tabled at equal intervals, interpolations required to evaluate \ for 
intermediate magnitudes in n may be readily effected. 

At the left of each table are three columns headed by @, , y’, and y, the 
latter two variables serving as auxiliaries in the use of the tables. For the 
whole range of values in 7, from — ~ to + ~ only the designations of positive 
and negative are inserted. The two auxiliary variables aré defined as 





B\-} 
y= (1 + ) (9) 
and 
er 


For i,/-/2n between — © and —0.75 and between 0.75 and +>, d is de- 
termined in conjunction with the column headed by y. In the instance that 
i,/-V2n falls between —0.75 and 0.75, ) is tabled against the column headed 
by y’. For both y and y’ the interval of the argument is one tenth. The range 
of values in y and y’ may be described by the following intervals: 0 < y < 1.0 
and —1.0 < 7’ < 1.0. 

It is interesting to note that 








1 
= - 11 
ae (11) 

yoy (12) 

V/2n ‘ 

y’ = =: (13) 

T “San 4’ 

y= V1-y”. (14) 


Since the entries within the tables are based upon values for e < .50, an 
important relationship is employed to obtain entries for \ when e > .50: 


d(n, 4, ,6€ = —d(n, —,,1— 8). (15) 


From use of tables designed for the direct calculation of but one fiducial point 
this equation permits the determination of both upper and lower fiducial 
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limits corresponding to probabilities of « and 1 — ¢ (where now e < .50) such 
that |z| > | é |. 

The steps involved in evaluating 6, in (8) such that the probability that 
|@| > | %& | is equal to « may be summarized as follows: 


(a) Obtain Zo relative to the values known for n and the statistic of which it is a function. 

(b) Determine ¢ from knowledge of é and n. 

(c) Evaluate y or y’ depending upon whether | %./+/2n | is greater than or less than 0.75. 

(d) Whenever n > 9, calculate 12/+/n. 

(e) If € is one of the values for which separate tables have been prepared obtain the value 
for \ relative to information gained from steps (c) and (d)—a step usually including 
single or double interpolation. (Ordinarily a linear interpolation procedure will suffice.) 

(f) Substitute derived values for f , ¢, and Xo into (8’) to determine 6p . 

(g) In the event that « > .50 evaluate 5(n, —Z , 1 — ¢) and subsequently change its sign. 


Illustrative Example 1. For a sample of N = 11 in which r,,; = .30, 
find 6(9, % , .025) and 6(9, é , .975). 


(a) From equation (1) 


Vil — 2 








bo = — || (.30) = .9435. 
" VI = (30) | 
(b) In (8) or (8’): 
m\2\3 
$= (1 dp (0185)") = 1.024. 
18 

(c) Since .9435/+/18 is less than .75, y’ (instead of y) is evaluated by (14) to be 

9435 1 

2-9 1.024 — ane 


(d) Since n = 9, the expression 12/+/n = 4, a value not entailing any interpolation be- 
tween columns. 

(e) In evaluating \(9, .9435, .025) it is seen in the tables that for n = 9 (or for 12/+/n = 4) 
entries for Xo at y’ = .20 and y’ = .30 are 1.981 and 1.989, respectively. Simple linear 
interpolation yields that \ = 1.983 when y’ = .2172. 

(f) Through substitution in (8) 

5(9, .9435, .025) = .9435 — 1.983(1.024) 


= —1.088. 
(g) For evaluation of 65(9, .9485, .975) use is made of the fact that 
5(9, .94385, .975) = —6(9, —.9435, 1 — .975). 


Therefore it is apparent that 
6(9, .9435, .975) = —[—.94385 — 1.983(1.024)] 
= 2.974. 


When the respective values obtained for 5 are set equal to the right-hand 
member of equation (2) into which the known value of N is substituted, upper 
and lower fiducial limits may be calculated for p,,,; . It is probably advantage- 
ous to rewrite (2) in a form that is solved explicitly for the desired parameter: 


5o 
Pook => Se—6—06 ° 16 
; VN + pi - 
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Illustrative Example 2. For the data given in the first example, find the 
upper and lower fiducial limits p,,;y and p,si7 . 
Through use of (16) it is found that 








—1.088 —" 
V11 + (—1.088)? 


Pobi i = 


and 





2.974 oe 
SS; = .0DB. 
V11 + (2.974)° 


Ppodiu = 


The 5 per cent fiducial interval corresponding to an r,,; of .30 and an N of 11 
is seen to be —.312 < p < .668. 


A comparison of the size of fiducial intervals associated with r,,; and r. From 
the tabulations presented by F. N. David (1) the lower and upper fiducial 
limits in p when r = .30 and N = 11 are estimated to be —.356 and .752 
compared with those limits of —.312 and .668 in p,,; when r,,; = .30 and 
N = 11. [If fiducial limits accurate to only hundredths are desired a chart in 
David (1) may be employed to advantage. This chart has been reproduced in 
Dixon and Massey (2, p. 327).] 

Thus it is apparent that the fiducial interval derived from r,,,; is smaller 
than that based upon the same numerical value in r. That such a finding is 
true throughout an extensive range of equal numerical values in r and r,,; 
irrespective of the size of sample is readily obvious from an inspection of the 
corresponding pairs of fiducial intervals presented in Table 1. Entries for N 


TABLE 1 
A Comparison of the Widths of the 5 per cent Fiducial Intervals of 
Pan A Corr ing to Numerical Values in 





0.30, 0.60, 0.80 and 0.90 
e 6, 11, 18, 38 and 1L6 


r and rp py of 
for Samples of Si 











Size of Sample 























w= 6 Well Ne 18 Ne 38 N =1L6 
r 0.00 -76 to 7é -58 to 58 -L6 to L6é -34 to 3h -17 to 17 
T> pi 9-00 62 to 62 -51 to 51 -42 to L2 -30 to 30 -16 to 16 
r 0 £2 to 8S -35 to 73 -19 to 66 -O to 57 13 to Lh 
Tp pi «0 30 -LY to 72 -31 to 66 -17 to 61 02 to Su 14 to 43 
r 0.60 -37 to 92 00 to 87 16 to 82 33 to 77 LB to 69 
Tp bi 0.60 -28 to 63 01 to 80 18 to 78 35 to 73 U9 to 68 
S 0.80 ~03 to 97 38 to 93 52 to 92 64, to 88 72 to 85 
Tr pi 9280 01 to 91 37 to 90 53 to 88 65 to 87 72 to 6h 
r 0.90 to 98 62 to 97 74 to 96 81 to 93 87 to 93 
Tp pi (0290 26 to 95 64 to 95 75 to 94 82 to 93 87 to 92 





*Decimal points for fiducial limits are omitted. 
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were selected in a manner to duplicate in part those given in Johnson and 
Welch’s Table IV. 

As might be expected the amount of discrepancy in the widths of the 
corresponding intervals is inversely proportional to the size of the sample and 
directly proportional to the magnitude of the correlation coefficient obtained 
for the sample. A possible intuitive interpretation as to why the coefficient 
r,»; is slightly more stable than the coefficient r may be due to the fact that 
sampling perturbations too small to induce values in x to change from 0 to 1 
or from 1 to 0 can be more readily reflected when z is a continuous variable. 

The reader may wonder why ‘t is that, when r,,; and r assume values of 
zero, the two corresponding fiducial intervals are not defined by the same 
limits, especially since it was implied previously that for p = p,,; = 0 the same 
values in r and r,,; were required for significance at a given level of prob- 
ability. In the instance of the identity of values in r and r,,; required for a 
given significance level, the two sampling distributions involving these sta- 
tistics about the parameter values of p = 0 and p,,; = 0 are the same. Hence 
the pairs of upper and lower sampling limits in r and r,,; about p and p,,; that 
correspond to the probability levels of 1 —« and «¢ are the same. 

In the case of the fiducial interval, however, the question being asked is in 
effect: What two values must a parameter assume such that the probability 
of the deviation (or discrepancy) of a given statistic from that parameter will 
be e? Relative to the 5 per cent level the upper fiducial limit is that value of 
the parameter such that the probability of a statistic departing more than the 
given one in a direction below the parameter is .025; the lower fiducial limit 
corresponds to a value of the parameter such that the probability of obtaining 
a statistic deviating more than the obtained one in a direction above the 
parameter is .025. For a coefficient of 7 = 0 or r,,; = 0 the upper and lower 
values of the parameters p and p,,; corresponding to the fiducial limits will not 
be zero. Whenever the values of p and p,,; are not zero the sampling distribu- 
tions of the corresponding statistics are not the same. Therefore, the fiducial 
limits will be different for r and r,,,;. 

It is known that the degree of skewness in the distribution of correlational 
statistics is a function of the size of the parameter. For larger values of r the 
corresponding presence of skewness in the sampling distribution is reflected 
by the amount of departure of the given r or r,,; from the upper fiducial limit 
of the parameter compared with that from the lower fiducial limit of the 
parameter. 


Some useful tabulations in the estimation of t, , ¢, and fiducial limits in 
p,»; - Corresponding to steps (a) and (b) described previously (prior to Illus- 
trative Example 1) in the evaluation of 6) in (8’) two sets of tabulations are 
presented in Tables 2 and 3 that may be employed in the estimation of , and 
¢. Entries within Table 2 are for 7, as determined from equation (1). Although 
in an actual problem interpolation would probably be required because of the 
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TABLE 2 


Values in © Corresponding to Selected Magnitudes of Tp pi and to Samples of Various Size 








Size of Sample N 








N= 6 Ne7 N=8 N=\9 N= 10 Nell N= 18 N= 38 N #146 
p bi 

0.00 0.0000 0. 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 
0.05 0.1001 QO. 0.1226 0.1325 0.1416 0.1502 0.2003 0. 300k 0.6008 
0.10 0.2010 0 0.2L62 0.2659 0.283 0.3015 0.L020 0.6030 1.2060 
0.15 0.3034 0. 0.3716 0.L0L4 0.4291 0.4552 0.6069 0.9103 1.8206 
0.20 0.4082 0. 0.5000 0.5L01 0.5774 0.612b 0.8165 1.22h7 2.UL95 
0.25 0.516L 0. 0.6325 0.6831 0.7303 0.7746 1.0328 1.5192 3.098L 
0.36 0.6290 0. 0.7703 0.8321 0.8895 0.9435 1.2579 1.8869 3.7738 
0.35 0.7473 0 0.9152 0.9885 1.0568 1.1209 1.L9L5 2.2h18 4.4836 
0.LO 0.8729 0) 1.0690 1.1547 1.23bL 1.3093 1.7457 2.6186 5.2372 
0.L5 1.0078 1 1.2343 1.3332 1.4253 1.5117 2.0156 3.023k 6.0468 
0.50 1.1547 1 1.4142 1.5275 1.6330 1.7321 2.3094 3.L6L21 6.9282 
0.55 1.3171 1.4726 1.6131 1.7b2h 1.8627 1.9757 2.632 3.9513 7.9026 
0.60 1.5000 1.6771 1.8371 1.9843 2.1213 2.2500 3.0000 4.5000 9.0000 
0.65 1.7107 1.9126 2.0951 2.2629 2.4193 2.5660 3.4213 5.1320 10.2640 
0.70 1.960b 2.1918 2.4010 2.593 2.772h 2.9406 3.9208 5.8812 11.762 
0.75 2.2678 2.5355 2.7775 3.0000 3.2071 3.4017 4.5356 6.803 13.6067 
0.80 2.6667 2.9814 3.2660 3.5277 3.7712 4.0000 5.3333 8.0000 16.0000 
0.85 3.2271 3.6080 3.952 4.2691 4.5639 4.807 6.45L3 9.6814 19.3628 
0.90 4.1295 4.6169 5.0576 5.4628 5.8400 6.1942 8.2590 12.3884  2h.7769 
0.95 6.08L9 6.8031 7.452 8.0495 8.6053 9.0733 12.1697 18.25L6 36.5092 
1.00 @ @ QD fe 2) eo @ @ @o co 





magnitude of the steps of .05 in r,,, and of the limited number of specific 
sample sizes, such a tabulation affords not only a quick means for an approxi- 
mation of values in 7, , but also a number of reference points for checking 
purposes. 


TABLE 3 


Values for gs as a Fmotion of 
Selected Magriitudes in r, 4 








Tp bi DP | tpi coo) Tp bi > 
0.00 1.0000 | 0.35 1.0343 0.70 1.2167 
0.05 1.0006 | 0.LO 1.0465 0.75 1,2817 
0.10 1.0025 | 0.L5 1.0616 0.80 1.37bb 
0.15 1.0057 | 0.50 1.0801 0.85 1.5172 
0.20 1.010, | 0.55 1.1031 0.90 1.7696 
0.25 1.0165 | 0.60 1.1319 0.95 2.3724 
0.30 1.02Lh | 0.65 1.1687 1.00 @ 











If, in the expression [(1 + %)/2n]! for ¢, one substitutes the right-hand 
member of (1) for Z, , it is readily shown that 


i. = i/2 = 
¢= Dear ae (17) 


a quantity which does not involve the size of sample. In Table 3 magnitudes 
for @ are given corresponding to steps of .05 in 7,,; . 
In Table 4 entries for 6 are presented that are a function of values in p,,; 
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as given by equation (2). Although these numerical values reflect the degree of 
non-centralness in /, for various magnitudes in p,,; and N, they may be used 
as a basis for estimation of fiducial limits in p,,; . Depending upon whether the 
sign for 6 is positive or negative, that for p,,,; will be positive or negative. For 


TABLE 4 
Values in § Corresponding to Selected Magnitudes of P> bi and to Samples of Various Size 











Size of Sample N 
p bi we 6 N27 1 ih a9 Nelo Nel N=18 W=38 3 N= 146 
0.00 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0000 0000 
0.05 0.1226 0.1325 0.1416 0.1502 0.1583 0.2124 ° 3086 ° 
0.10 0.2462 0.2659 0.2843 0.3015 0.3178 0.3333 0.4264 0.6195 1.2144 
0.15 0.3716 0.4014 0.4291 0.4552 0.4798 0.5032 0.6437 0.9352 1.8332 
0.20 0.5000 0.5401 0.5774 0.612 0.6455 0.6770 0.8660 pw 2. 
0.25 0.6325 0.6831 0.7303 0.7746 0.8165 0.8563 1.0954, 1.5916 3.1198 
0.30 0.7703 0.8321 0.8895 0.9435 0.995 1.0430 1.3343 1.9386 3.8000 
0.35 0.9152 0.9885 1.0568 1.1209 1.1815 1.2392 1.5852 2.3032 Le 
0.40 1.0690 1.1547 1.23L4 1.3093 1.3801 1.LL75 1.8516 2. sane 
O.L5 1.233 1.3332 1.4252 1.5117 1.5935 1.6713 2.1379 3.1063 6.0887 
0.50 1.41L2 1.5275 1.6330 1.7321 1.8257 1.9149 2..LL95 3.5590 6.9762 
0.55 1.6131 1.7424 1.8627 1.9757 2.0825 2.182 2.7940 4.0596 7.9573 
0.60 1.8371 1.9843 2.1213 2.2500 2.3717 2.875 3.1820 4.6233 9.0623 
0.65 2.0951 2.2630 2.4193 2.5660 2.7048 2.8368 3.6289 5.2727 10.3352 
0.70 2.4010 2.593k 2.772 2.9406 3.0997 3.2509 4.1586 6.0423 11.6438 
0.75 2.7775 3.2071 3.4017 3.5857 3.7607 4.8107 6.9898 13. 
0.80 3.2660 3.5277 7712 4.2264 4.222 i 8.2192 16.1107 
0.85 3.952h 4.269 4.5639 4.8407 5.1026 5.3516 6.8458 9.9467 19.1968 
0.90 5.0576 5.4628 00 6.19h2 6.5293 6.8480 8.7600 12.7279 2h.94 
0.95 7.U52b 8.0495 8.6053 9.0733 9.6210 10. 12.9080 18.7548 36.7619 
1.00 i) i) CO) @ to) t @ rs) 





the data in Example 2 it is seen that the value of — 1.088 in 6, falls between 
1.0430 and 1.2392 and that the other value of 2.974 in 6 lies between 2.8368 
and 3.2509. Simple linear interpolation reveals that the corresponding fiducial 
limits would be approximately —.31 and .67. 


III. A Normal Approximation to the Transformation Employed in the De- 
termination of 5) [Equation (8’)] 
In their discussion concerning large sample results Johnson and Welch 
(6, 367-70) have suggested that in the expression 


do = bo + dod (18) 


i, may be referred to a normal scale with mean 6) and standard deviation ¢ 
if K, , the deviate of a unit normal curve exceeded with a probability ¢, is 
substituted for > . If % is known and if this deviate K, is substituted for Xo 
in (8’), a means is furnished for obtaining approximate values in 6) necessary 
to the determination of approximate fiducial limits in p,,; through use of the 
following transformation: 


5o _ io iat Ko. (19) 
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In this normal approximation to (8’) one may deduce that 6 is essentially 
normally distributed about a mean i, with a standard deviation of ¢, . 

Corresponding to the 5 per cent fiducial limits in p,,; the deviate K, as- 
sumes a value of 1.960. Hence relative to a value of .025 in e (and a value of 
.975 in 1 — e¢) the transformation may be written as 


5o = tp — (41.960)¢. (19’) 


In terms of the application of (19) to the determination of 6, in the in- 
stance of the point biserial coefficient it may be said simply that the mean and 
standard deviations are given by the right-hand members of (1) and (17), 
respectively. The tabulations given in Table 2 and Table 3 may be employed 
to obtain estimates of the mean and standard deviation, respectively. 

Through use of (19’) the fiducial limits in p,,; corresponding to the selected 
values of r,,; and N in Table 1 were calculated. Only one limit differed by as 
much as .01 from those computed by the exact method based on the use of 
Johnson and Welch’s tables. That such a finding might be expected is readily 
apparent from the relatively small departure of values in \, from those of 
1.960 in K, . For n = 4, 9, 16, 36, and 144 the entries in \, fall between 1.869 
and 2.006, 1.904 and 1.996, 1.920 and 1.989, 1.935 and 1.981, and 1.943 and 
1.971, respectively. Even for samples consisting of as few cases as 10 the nor- 
mal approximation will furnish estimates of fiducial limits not likely to differ 
by more than .01 from those determined by the exact procedure provided that 
e < .025. 


Illustrative Example 3. For the data given in the two previous examples, 
estimate the 5 per cent fiducial limits through use of the normal-approximation 
transformation given by (19’). 

Since 7, = .9435, K, = 1.960, and @ = 1.0244, 


5) = .9435 — (+1.960) -(.0244), 
or — 1.064 and 2.951. (Values of —1.088 and 2.974 were calculated through 


employment of the exact method.) Substitution of these two values for 65 
into (16) yields 


e——=AE ) or as .306 
V11 + (—1.064)’ 


PoroiL = 


iis nen, Oe MOB. 
7 WW + (2.951°)? 


These lower and upper fiducial limits approximate closely those of —.312 and 
.668 obtained through use of the exact procedure. 
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IV. A Single General Formula for Determination of Fiducial Limits in py»; 


The procedures that have been described previously for the determina- 
tion of the fiducial limits in p,,; may be combined into a single equation in 
which the independent variables are r,,; , N, and \, and in which the depen- 

-dent variable representing either one of two fiducial limits may be denoted as 
Ppoir - Substitution of the right-hand member of (1) for 2, and of the right-hand 
member of (17) into (8’) yields 


hax N as 2 1 — r,;/2 
59 = wi = 2, Tua + Ao! aya ry, (20) 


Insertion of the right-hand member of (20) for 6) in (16) and subsequent 
algebraic simplification leads to the following expression from which fiducial 
limits can be obtained: 


VIN = 2 fos + oVI — oul? - (21) 
2 _ 
wn +) —2+ %s oe + QWol'pri VN-2V1- Toyi/2 








Pooir = 





To obtain the lower or upper fiducial limit one affixes a positive or a 
negative sign, respectively, to the entry obtained for A, . If the procedure 
based on an approximation to normality is followed K, is substituted for Ao . 
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BOOK REVIEWS 


WiiuiaM STEPHENSON. The Study of Behavior: Q-Technique and Its Methodology. Chicago: 
University of Chicago Press, 1953. pp. ix + 376. 


Stephenson was among the original investigators of correlation between persons, 
when the method first received attention in England about twenty years ago. While others 
considered the idea and dropped it, he maintained his interest and has evolved a variety of 
related techniques, terming the whole “Q-methodology.” In this book he presents for the 
first time a single comprehensive collection of his ideas. 

The broad and all-inclusive title suggests Stephenson’s claims for Q-methodology. 
He proposes it as the key to investigation of such popular matters as the self-concept, 
psychoanalytic theory, projective techniques, and the individual personality. He states 
that his methods enable one to study the individual objectively and systematically in an 
interactional setting, without regard for norms or individual differences. Such sweeping 
claims are certain to attract the attention of many research workers in fields for which 
statistics and methodology are still inadequate. 

Q-methodology is concerned with the intra-individual comparison of stimuli. Data 
are obtained by requiring each person to rate a group of statements according to their 
relative ‘‘significance.”” Stephenson regards his methods as yielding fundamentally different 
types of results from “R technique” studies of correlation between tests, and indeed as 
resting on a superior philosophy of science. He also disavows interest in that type of correla- 
tion between persons which Burt originally studied, based on the transpose of the matrixe 
of test scores for persons. 

It should be made clear that Q-methodology is not a single method. It is a bundle of 
loosely related devices which can be separately adopted or rejected. In particular, while 
Q correlation is the parent technique, it is now only one of many devices in the collection. 
Stephenson deals with at least seven technical inventions: (1) use of questionnaire items to 
record perceptions of the self and others under various instructional sets, (2) obtaining 
responses by card-sorting, (3) forcing responses into a pre-established distribution, (4) 
description of profile similarity by Q correlation, (5) factor analysis of Q correlations, (6) 
selection of questionnaire items in terms of a “structured design” patterned after Fisher’s 
experimental procedures, (7) application of analysis of variance to data from the single 
individual. We shall consider each of these briefly. 

The flexible use of questionnaire items is a distinctly promising proposal. One can 
ask a person to describe himself, and also to describe his acquaintances; to describe his 
ideal, his pre-therapy self, or himself as perceived by his family, his friends, etc. The 
technique is also useful for standardizing descriptions based on projective protocols or 
impressions of a therapist. While responses of this type are subject to all the limitations 
of introspective report, they are more comparable than verbal descriptions, and scores from 
them can be more reliable than ratings. They should be useful for the investigation of many 
types of hypotheses. 

Stephenson writes items on separate cards and asks the subject to sort the cards 
according to their scale value, rather than marking them one at a time in a booklet. Sorting 
has the possible advantage of providing a more constant frame of reference. Stephenson 
has evidently 1 made_no empirical studi rmine whether sorting gives appreciably 
different results from the simpler booklet method. aN E ee oe ee 


With the card-sort, Stephenson can require that a pre-determined number of cards 
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be placed at each scale position to obtain a near-normal distribution. This forcing is of 
dubious value. It does ensure variance in the responses and eliminates response sets. How- 
ever, it discards possibly important information about differences in scatter, and gives 
data to which analysis of variance cannot properly be applied. 

Stephenson may have originally introduced the forced normal distribution so that 
he could interpret Q correlations in terms of the properties of the bivariate normal surface. 
If so, he has abandoned the idea. However, he does use Q correlations to measure reliability 
and profile similarity, and for factor analysis. The disadvantages of Q correlation as a 
measure of profile similarity have been discussed elsewhere.* Since Stephenson deals only 
with subjective ratings of statements or other stimuli he does not standardize scores within 
stimuli as has been done when correlations between persons were obtained for test data. 
He correctly argues that such standardization, especially on the basis of a small sample 
of persons, is often undesirable. However, he ignores the necessity for controlling social 
desirability of items or stimuli to avoid confounding of interpretations. f 

Factor analysis of Q correlations is used in two ways. In an “independency analysis” 
a search is made for hypotheses which must be subsequently tested on a fresh sample. In 
“dependency analysis” an attempt is made to rotate the data according to a preconceived 
hypothesis. Factor analysis is not, however, as well adapted to formal statistical tests of 
hypotheses as are other forms of multivariate analysis, and Stephenson’s procedures lead 
to significance tests only when he turns to analysis of variance. 

Many of the factor analyses used as illustrations display highly questionable tech- 
niques. In some cases the analysis is used to identify types of persons, the axes of an ortho- 
gonal simple structure for a group of persons being regarded as defining such types. For 
these studies there are never sufficient variates (persons or sets of responses by one person) 
to locate a structure confidently. Often the structure chosen appears to be arbitrary. One 
of the main examples purports to show that the same two-factor structure is found in 
unselected cases as in cases selected to fit a postulated type (p. 170). The reader can de- 
termine that this example is spurious by eliminating from Stephenson’s Figure 2 the cases 
which are repeated from Figure 1. Immediately the “structure” vanishes. An additional 
defect is that in checking invariance Stephenson does not indicate whether he has examined 
the new sample for the probable presence of factors orthogonal to the original two. Other 
analyses, similarly questionable in execution or interpretation, are used as a basis for broad 
generalizations. 

While factor analysis is introduced repeatedly, it seems never to add clarity to the 
original scores or intercorrelations. In one instance a single matrix is rotated in two ways 
(p. 258, 262), each rotation leading to a distinct theoretical interpretation. Factor analysis 
used in such a manner is an arbitrary and untrustworthy tool, yielding only pseudo- 
elegance. 

In early Q studies, items were collected with no particular rationale, in the belief 
that almost any items in a given domain would give the same results. Stephenson first 
modified this by introducing a “balancing technique,” designed to prevent loss of informa- 
tion about the first factor among the items resulting from a forced-sort. Balancing requires 
that the investigator pair statements, including for each statement one which has near- 
opposite meaning. This important precaution may be hard to execute, especially when 
statements for sorting are selected from a therapy protocol or the like. Now he introduces 
a further modification, the use of a structured sample of statements. This is a particularly 
interesting innovation. Stephenson apparently regards this as much superior to the use of 


*Cronbach, L. J. and Gleser, G. C. Assessing similarity between profiles. Psychol. 


Bull., 1953, 50, 456-473. 
’+Edwards, A. L. and Horst, P. Social desirability as a variable i in Q technique studies. 


Educ. psychol. Measmt., 1953, 13, 620-625. 
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random collections of statements, although many of his illustrative studies are of the latter 
character. 

The structured design is a plan for measuring trait interactions, in contrast to the 
conventional single-variable method. For example, introversion and dominance have 
ordinarily been measured by means of separate sets of items. Stephenson would choose 
items such that all combinations of the two traits are represented, as in the following 
design: 

Introvert Extrovert 





Dominant 





Submissive 














He would seek items with which a dominant introvert might describe himself, and so on, 
for each cell of the factor design, using the same number of items for each interaction. 
More than two levels on a scale and more than two scales may be used. The selected items 
would be administered to a subject, and the average score computed for items in each cell, 
row, and column. The row means describe the subject’s location on a dominance-submission 
scale. The column means locate him on an introversion scale. The cell means describe the 
interaction. Conceivably, a person might give extrovert reactions only in connection with 
submissive reactions. His mean in that cell would then be higher than expected from the 
row and column means. This approach to measurement is thoroughly relevant to modern 
personality theory which holds that whether a tendency toward (say) extroversion will be 
expressed in a given situation depends on the person’s other traits. It may also have impli- 
cations for measuring the way an individual integrates abilities and for projective tests 
which are based on interactional concepts. 

Stephenson applies Fisher’s F-test to data from the structured sample to test the 
significance of differences between rows, columns, and interaction for the person. This is a 
modern version of the use of critical ratio to test for significance of differences within a 
profile, with these important changes: (a) no assumption of uniform error for all persons is 
made; (b) many more hypotheses can be tested with the same number of items; (c) inter- 
actions can be tested. Analysis upon a single person leads to such generalizations as ‘This 
woman accepts introvert self-descriptions more often than extrovert ones” or ‘‘Variance 
in this woman’s response to statements is not significantly accounted for by differences in 
impulsiveness expressed in the statements.” 

Stephenson does not realize that it is incorrect to use the F-test on data obtained 
from forced sorts. This test is based on the assumption that the values in each cell are 
randomly sampled from an infinite population so that if the null hypothesis holds, the 
probabilities associated with any particular value of a variate remain constant for all 
samples. The forced sort is a type of sampling without replacement from a limited pool of 
values, and hence the probabilities are modified after each selection. Thus if a subject 
assigns the first item to category 1, the probability of assigning any succeeding item to 
that category is reduced. This particular difficulty would be avoided if constraints were 
removed from the sorting. 

The proposal to test hypotheses about the single individual is promising. It might 
provide a rigorous methodology for studying idiosyncratic aspects of the personality. 
Before analysis of variance within persons can be interpreted, however, the problem of 
randomness must be fully considered. An investigator interested in determining differences 
between various species of plants would make sure that the seeds used are random samples 
of their species and that they are assigned randomly over treatments, so that all effects 
other than the ones under consideration are random. Stephenson says (p. 112) that his items 
are not parallel to Fisher’s plants. ‘What is at issue, instead of plants, is an operator’s acts 
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of judgment, or decisions, made about the statements. These acts should be made randomly.” 
But it is the testmaker’s judgments in construction of items which must be random with 
regard to extraneous effects, if we wish to generalize from statements in a cell to the factors 
the block supposedly represents. 

If a person tends to reject grammatically complex statements, and the testmaker 
has had trouble writing simply-worded items for one block, then the subject will have a low 
mean for the block quite apart from the content of the items. Also one must question with 
what confidence items can be assigned to specific cells. The whole history of factor analysis 
and test validation warns against trusting that items “mean what they say.” Certainly 
many of the examples given by Stephenson are dubious in this respect. An attitude test, 
for example, is structured around the dimensions “hostile-submissive,” ‘‘dominant-sub- 
ordinate,” ‘‘patriotism-minorities-Negroes-Jews.”’ Even if we waive questions about the 
psychology implied in the dimensions, can we have confidence that “I put complete loyalty 
to my country high above all other considerations’ represents the cell ‘‘Submissive- 
dominant-patriotism” to which it is assigned? There are many such examples of what is at 
best carelessness, at worst perhaps an unavoidable limitation on the structured-sample 
technique. 

Stephenson does not consider issues of this character. He takes no precautions to 
ensure that items in a cell are a random sample of the statements over which he intends to 
generalize. Unless a universe of situations can be meaningfully defined and sampled, gener- 
alizations from sample to universe cannot be made by analysis of variance within persons. 

In summary, then, we find that Stephenson’s proposals are not ready for adoption 
except by sophisticated investigators who can trace his reasoning and evaluate the specific 
methodologies for themselves. Jt is imperative to discourage students of personality and social 
psychology from copying Stephenson’s designs as he presents them. Many research investi- 
gators have run into difficulty—usually unrecognized by them—which made it impossible 
to establish the intended conclusions. We fear that Stephenson’s book may misdirect much 
research effort. 

Our own tentative evaluation of Stephenson’s ingenious schemes, subject to re- 
examination when the necessary technical studies are made, is as follows: The method of 
gathering data includes real advances. Recording evaluations of others on structured 
questionnaires is likely to be a fine tool for validation of assessments and studies of social 
perception. Obtaining repeated responses from the same person with changing sets will be 
useful in studying change during therapy, change in role from situation to situation, etc. 
It is possible that the organization of questionnaires to measure interactions between 
traits by means of a structured design is an important lead. Among the methods of treating 
data, analysis of variance is potentially advantageous 7f the forced distribution is abandoned, 
and if the sampling problems described above can be overcome. There may be some value 
also in the card-sorting technique, provided forcing is omitted. On the other hand, we would 
discard the Q correlation as a general technique to measure profile similarity. Factor analysis 
of Q correlations or other measures of profile similarity appears to have little value. 

Stephenson’s writing is showy, and lacks the care and explicitness that must be 
demanded in methodological writing. Readers may overlook the tiny sentences where 
Stephenson mentions the limitations of his methods. A loose style is especially unfortunate 
in a book advertised as “‘a treatise on the sociopsychological study of human behavior... 
(which) offers experimental foundations for almost all general psychology (and) for the 
direct operational study of self-psychology and psychoanalysis.’’ Specialists in psycho- 
metrics will be dissatisfied with Stephenson’s style, manner, and lack of clarity, but if they 
have the patience to disentangle his arguments, they are likely to pick up several ideas 
worthy of more rigorous thought than they have yet received. 


University of Illinois Lee J. Cronbach 
Washington Univ. School of Medicine Goldine C. Gleser 
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Comments on Cronbach and Gleser’s Review of: The Study of Behavior: 
Q-Technique and Its Methodology 


I must thank the Editor for suggesting this reply, and the two reviewers 
for looking into my book so constructively. In a general way I take little 
exception to the review. I am under no illusions about the complexity of the 
matters at issue, and I am indeed grateful that Cronbach and Gleser have 
been prepared to take me seriously. 

My reviewers, however, have perhaps read the book from too concrete 
a standpoint, and thus fail to notice that it requires a rather more abstract 
treatment. Chess is at issue, rather than a simple set of games of draughts. 
The details of technique, and my illustrations, have to be regarded as indi- 
cations rather than definitive instructions. It was for this reason, indeed, 
that I was careful not to provide any ‘“‘cook-book” on Q, to the disappoint- 
ment of many, I am sure. As Cronbach and Gleser note, the ideas I write 
about are for sophisticated investigators to ponder over. But a beginning 
has to be made somewhere, and, in spite of Q’s twenty years of history, I 
regarded the book as a bare beginning. The examples served my immediate 
purposes, provided one recognized them for what they are, namely, rough 
indications of how things should go. Thus, as to many matters of statistical 
detail and nicety I have no quarrel with the reviewers: non-forced distri- 
butions, and new non-parametric devices of various kinds will no doubt be 
useful in particular circumstances. However, the forced-choice procedure is 
still of paramount significance: but it is not because standardization ‘‘is 
often undesirable” that one uses it—for subjective operations no other 
rationale seems possible. And this, if I might say so, remains a source of 
considerable confusion for my reviewers. 

The trouble, however, is not any lack of clarity on my part, but an 
unwillingness on the part of my reviewers to argue from the right premises. 
Non sequitor arguments indeed characterize the more critical sections of the 
review. Granted the reviewers’ premises, their conclusions of course follow. 
The real point is, however, whether the premises can stick. Thus, it is asserted, 
as it was in the earlier study by Cronbach and Gleser, that ‘‘Q correlations” 
should be discarded as a general technique for measuring “profile similarity.”’ 
I quite agree. But Q, as I define it, in no way deals with profiles in the re- 
viewers’ sense. On the contrary, all my efforts have been directed towards 
the denial that individual differences, which are assumed for profile analysis 
but not for Q, are anywhere required as postulates in Q. The whole point is 
missed if the assumptions about individual differences remain unquestioned, 
as they are for Cronbach and Gleser. A similar, but more serious non sequitor 
occurs when reference is made in the review to the testmaker’s judgments in 
constructing test items: it is one thing to try to randomize statements, and 
another to seek for conditions under which an operator’s acts of judgment 
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may be regarded as randomly given. I stress the latter as the essential matter 
and Cronbach and Gleser the former. The reason for the choice made by the 
reviewers is that “we wish to generalize from statements in a cell to the 
factors the block supposedly represents,” so that randomizing the statements 
is essential. However, it is only Cronbach and Gleser who wish to “generalize”’ 
in this way. I certainly don’t, and I have been at pains to try to say why not. 
It is indeed important to ask what is being ‘‘generalized.’’ If it is assumed 
that normative conditions are desirable, and can be reached for the structure 
of a Q-sample, then the reviewers’ act of so-called generalization is merely 
tautologous. One only gets out what one puts in originally. I have nowhere 
regarded generalization as such a naive matter. 

Consider, for example, Freud’s case, of Dora (p. 250). I represent her 
conflict, and the interactional setting, in a structured sample. But I am not 
thereby constrained to prove the design in any sense, which is what Cronbach 
and Gleser are supposing. On the contrary, I merely wish to use it. That is, 
I would seek to make Dora operate, within the context provided by the 
Q-sample so structured, under many different conditions of instruction, 
none of which need have any specific reference to the design of the Q-sample. 
Dora may make, perhaps, 50 Q-sorts, all for different conditions of experi- 
ment and instruction. One’s interest is in these intercorrelations, and one’s 
hope is to make genuine discoveries, for example, about Dora’s rationalizing 
or projecting behaviors, and certainly not to play an actuarial game of 
psychometric checkers with the logical properties of the Q-sample. Of course 
all the factors can have a certain explanation in terms of the structure— 
but this is not where the scientific interest lies. What one may expect, if the 
context of the Q-sample and the various conditions of instruction are fruitful, 
are some genuine discoveries, that is, some intensive inductions, from which 
real and worth-while generalizations are possible. In the book I certainly 
do make use of the structure of Q-samples and of their variance analysis 
to draw conclusions. But I am aware of the limitations. The psychometrician 
cannot lose sight of the structure, however, whereas I am asking him to look 
a little further ahead, and to see at least the possibility of inductions that 
are not merely properties of the structure of a Q-sample. Cronbach and 
Gleser have not grasped this, and I do not think, again, that I am to blame, 
since the book provides the rationale and illustrations of it. 

I must suggest, indeed, that deep-seated protopostulatory matters 
separate the reviewers and myself, and that from these many of the apparent 
difficulties of my style and their frustrations stem. In their case it stems 
from an impossible search for “‘constant’’ conditions. In my case a certain 
perversity, no doubt, can be discerned, which, however, I enjoy. My own 
treatment of Q is linked, of course, to the classical psychophysical methods, 
especially to the constant methods and to that of single stimuli in particular. 
But I learned long ago to stop looking for ‘“‘constant”’ conditions. My reviewers 
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still have to take this plunge. Sensations for them, no doubt, are “‘objective’’: 
perfumes are pleasant per se and putrifying substances unpleasant objectively. 
But even a rose doesn’t smell sweet to an Eskimo, whereas the odors of 
putrifaction do; and if this is true of sensations, how much more so must it 
be of human behavior, more especially the subjective kind! May I say, then, 
that the specialist in psychometrics that Cronbach so brilliantly represents 
is based upon archaic modes of thought of this “constancy” kind? This was 
the main theme, after all, of my book; so that I shall indeed be unhappy 
if the review directs attention to the lesser but no doubt ingenious schemes 
to which Cronbach and Gleser make such kindly reference and away from the 
more substantial issues about which the review says nothing. 

However, one must be patient, as my reviewers have so notably been. 
What I mind most, however, is that I should be regarded as lacking in clarity. 
For any real lack that remains after honest effort is made to follow my 
arguments, I humbly apologize. But I deny that there is really much sub- 
stance to this criticism: new ideas are perhaps worth pondering over, by 
readers and author alike. And if my critics persist in arguing from premises 
I not only do not make but am at great pains to deny, then I think I know 
where the charge of lack of care and explicitness has at least some of its 
beginnings. I would repeat, however, that I appreciate much of the review, 
and that I echo its words of caution, without, however, agreeing that any 
inroads have been made by the review into the really substantial matters 
of the book. 

A word is due, finally, about factor analysis. The reviewers again, if I 
may say so, are on unhappy grounds here. Far from factor analysis of Q 
correlations appearing to have little value, the situation is otherwise. Again, 
however, it depends upon one’s purposes. Mine are at least clear to me: they 
are to the effect that I use factor methods in complex situations, to help me 
to understand theorectical matters, for example, theoretical matters in 
psychoanalysis or in self-theory. Quite rough and ready procedures are 
adequate for this purpose, since one’s real interest is in the psychology, and 
not in any search for strict parameters or the like of sophisticated statisticians. 
Again I would ask for caution about these procedures, but now from my 
reviewers, because they are likely to be quite other than bees in anyone’s 
bonnet. 


The University of Chicago. William Stephenson 
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Paut G. Hoe . Introduction to Mathematical Statistics, Second Edition. New York: John 
Wiley and Sons, Inc., 1954, pp. xi, 331. 


The first edition of this book was published in 1947. The second “‘is a rather extensive 
revision of the first edition.” 

The subject matter dealt with is indicated by the chapter headings: Probability, Na- 
ture of Statistical Methods, Empirical Frequency Distributions of One Variable, Theoretical 
Frequency Distributions of One Variable, Elementary Sampling Theory of One Variable, 
Correlation and Regression, Theoretical Frequency Distribution for Correlation and 
Regression, Testing Goodness of Fit, General Principles for Testing Hypotheses and for 
Estimation, Small Sample Distributions, Statistical Design in Experiments, Nonparametric 
Methods. It is a delight to see that many exercises are provided. 

The level of mathematics required for reading the book is the usual two-semester 
course in the calculus. The treatment of the various subjects is satisfactory, given this 
limitation of mathematical level and the need for dealing with continuous distributions. 

The book is very well written. Its study would have been made easier for many stu- 
dents had more exercises been worked out in the text and, in addition, answers provided for 
some exercises. This was particularly to be desired when approximations are used and appli- 
cations discussed. 

I should like to make four comments for readers of this, or almost any other beginning 
book in mathematical statistics. 

(1) Unless you work out many of the exercises, you won’t really understand or be 
able effectively to apply what the book offers. 

(2) You mustn’t expect to be able to read or study much mathematical statistics 
after this book unless you study more mathematics. Parts of advanced calculus and matrix 
theory will be found to be minimum requirements, except when authors or instructors either 
consciously write down to a lower level or wave their hands. Perhaps some readers of this 
review will not even have studied the calculus and will regard the calculus as a high level 
of mathematics. But in the wonderfully exciting and growing field of statistics today the 
lag in expository writing will continue. In psychology itself, more and more mathematics is 
being used. I can only express the hope that training in and study of mathematics will 
improve sufficiently to keep mathematics from being the barrier to the realization of abilities 
that it has been in the past. 

(3) The present book won’t leave you able to translate your problems of the world of 
experience into statistical form. No book yet in existence does that. But, if you have an 
opportunity to work more or less as an apprentice with someone who can translate well, 
this book will considerably improve your ability to profit by that experience. Also, the book 
will improve your ability to communicate with mathematical statisticians. 

(4) It would be easy to criticize the contents and approach of this or any other intro- 
ductory mathematical statistics book. There is no mention of decision theory. Topics of 
interest in the social sciences aren’t taken up. Some of the mathematics could be done 
“better.” But the book only purports to be an introduction to mathematical statistics. 
The author has faced up to the necessities of both theory and applications while accepting 
a severe limitation on mathematical level. Some compromises were necessary, and it is 
always easy to criticize the compromises of others. 

In this reviewer’s judgment, this is a good introduction to mathematical statistics 
for those who have had the usual two-semester calculus course and wish to study applica- 
tions as well as theory. 


University of Illinois William G. Madow 
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